Edit model card

Falcon 7B LLM Fine Tune Model

Model description

This model is a fine-tuned version of the tiiuae/falcon-7b model using the QLoRa library and the PEFT library.

Intended uses & limitations

How to use

  • The model and tokenizer are loaded using the from_pretrained methods.
  • The padding token of the tokenizer is set to be the same as the end-of-sentence (EOS) token.
  • The generation_config is used to set parameters for generating responses, such as the maximum number of new tokens to generate and the temperature for the softmax function.
  • The prompt is defined, encoded using the tokenizer, and passed to the model.generate method to generate a response.
  • The generated response is decoded using the tokenizer and printed.
# Import necessary classes and functions
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftConfig, PeftModel

# Specify the model
PEFT_MODEL = "hipnologo/falcon-7b-qlora-finetune-chatbot"

# Load the PEFT config
config = PeftConfig.from_pretrained(PEFT_MODEL)

# Load the base model and tokenizer
model = AutoModelForCausalLM.from_pretrained(
    config.based_model_name_or_path,
    return_dict=True,
    quantization_config=bnb_config,
    device_map="auto",
    trust_remote_code=True,
)
tokenizer = AutoTokenizer.from_pretrained(config.base_model_name_or_path)

# Set the padding token to be the same as the EOS token
tokenizer.pad_token = tokenizer.eos_token

# Load the PEFT model
model = PeftModel.from_pretrained(model, PEFT_MODEL)

# Set the generation parameters
generation_config = model.generation_config
generation_config.max_new_tokens = 200
generation_config.temperature = 0.7
generation_config.top_p = 0.7
generation_config.num_return_sequences = 1
generation_config.pad_token_id = tokenizer.eos_token_id
generation_config.eos_token_id = tokenizer.eos_token_id

# Define the prompt
prompt = """
<human>: How can I create an account?
<assistant>:
""".strip()
print(prompt)

# Encode the prompt
encoding = tokenizer(prompt, return_tensors="pt").to(model.device)

# Generate a response
with torch.inference_mode():
  outputs = model.generate(
      input_ids=encoding.input_ids,
      attention_mask=encoding.attention_mask,
      generation_config=generation_config,
  )

# Print the generated response
print(tokenizer.decode(outputs[0],skip_special_tokens=True))

Training procedure

The model was fine-tuned on the Ecommerce-FAQ-Chatbot-Dataset using the bitsandbytes quantization config:

  • load_in_8bit: False
  • load_in_4bit: True
  • llm_int8_threshold: 6.0
  • llm_int8_skip_modules: None
  • llm_int8_enable_fp32_cpu_offload: False
  • llm_int8_has_fp16_weight: False
  • bnb_4bit_quant_type: nf4
  • bnb_4bit_use_double_quant: True
  • bnb_4bit_compute_dtype: bfloat16

Framework versions

  • PEFT 0.4.0.dev0

Evaluation results

The model was trained for 80 steps, with the training loss decreasing from 0.184 to nearly 0. The final training loss was 0.03094411873175886.

  • Trainable params: 2359296
  • All params: 3611104128
  • Trainable%: 0.06533447711203746

License

This model is licensed under Apache 2.0. Please see the LICENSE for more information.

Downloads last month
43
Inference API
This model can be loaded on Inference API (serverless).

Adapter for