metadata

language:
  - en
license: apache-2.0
library_name: peft
tags:
  - text-generation-inference
datasets:
  - Abirate/english_quotes
pipeline_tag: text-generation
base_model: EleutherAI/gpt-neox-20b

hipnologo/GPT-Neox-20b-QLoRA-FineTune-english_quotes_dataset

Training procedure

The following bitsandbytes quantization config was used during training:

load_in_8bit: False
load_in-4bit: True
llm_int8_threshold: 6.0
llm_int8_skip_modules: None
llm_int8_enable_fp32_cpu_offload: False
llm_int8_has_fp16_weight: False
bnb_4bit_quant_type: nf4
bnb_4bit_use_double_quant: True
bnb_4bit_compute_dtype: bfloat16

Model description

This model is a fine-tuned version of the EleutherAI/gpt-neox-20b model using the QLoRa library and the PEFT library.

How to use

The code below performs the following steps:

Imports the necessary libraries: torch and classes from the transformers library.
Specifies the model_id as "hipnologo/GPT-Neox-20b-QLoRA-FineTune-english_quotes_dataset".
Defines a BitsAndBytesConfig object named bnb_config with the following configuration:
- load_in_4bit set to True
- bnb_4bit_use_double_quant set to True
- bnb_4bit_quant_type set to "nf4"
- bnb_4bit_compute_dtype set to torch.bfloat16
Initializes an AutoTokenizer object named tokenizer by loading the tokenizer for the specified model_id.
Initializes an AutoModelForCausalLM object named model by loading the pre-trained model for the specified model_id and providing the quantization_config as bnb_config. The model is loaded on device cuda:0.
Defines a variable text with the value "Twenty years from now".
Defines a variable device with the value "cuda:0", representing the device on which the model will be executed.
Encodes the text using the tokenizer and converts it to a PyTorch tensor, assigning it to the inputs variable. The tensor is moved to the specified device.
Generates text using the model.generate method by passing the inputs tensor and setting the max_new_tokens parameter to 20. The generated output is assigned to the outputs variable.
Decodes the outputs tensor using the tokenizer to obtain the generated text without special tokens, and assigns it to the generated_text variable.
Prints the generated_text.

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig

# Load the base pre-trained model
base_model_id = "EleutherAI/gpt-neox-20b"
tokenizer = AutoTokenizer.from_pretrained(base_model_id)
model = AutoModelForCausalLM.from_pretrained(base_model_id)

# Fine-tuning model
model_id = "hipnologo/GPT-Neox-20b-QLoRA-FineTune-english_quotes_dataset"
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16
)

# Load the fine-tuned model
model = AutoModelForCausalLM.from_pretrained(model_id, quantization_config=bnb_config, device_map={"":0})

text = "Twenty years from now"
device = "cuda:0"
inputs = tokenizer(text, return_tensors="pt").to(device)

outputs = model.generate(**inputs, max_new_tokens=20)
generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(generated_text)

Framework versions

PEFT 0.4.0.dev0

Training procedure

Trainable params: 8650752
all params: 10597552128
trainable%: 0.08162971878329976

License

This model is licensed under Apache 2.0. Please see the LICENSE for more information.