license: mit
datasets:
- mlabonne/guanaco-llama2-1k
language:
- en
metrics:
- bleu
library_name: adapter-transformers
tags:
- chemistry
- biology
- finance
- legal
- music
- art
- code
- climate
- medical
- text-generation-inference
Deployed Model
AjayMukundS/Llama-2-7b-chat-finetune
Model Description
This is a Llama 2 Fine Tuned Model with 7 Billion Parameters on the Dataset from mlabonne/guanaco-llama2. The training data is basically a Chat between a Human and an Assistant where the Human poses some queries and the Assistant responds to those queries in a suitable fashion. In the case of Llama 2, the following Chat Template is used for the chat models:
[INST] SYSTEM PROMPT
User Prompt [/INST] Model Answer
System Prompt (optional) --> to guide the model
User prompt (required) --> to give the instruction / User Query
Model Answer (required)
Training Data
The Instruction Dataset is reformated to follow the above Llama 2 template.
Original Dataset --> https://huggingface.co/datasets/timdettmers/openassistant-guanaco\
Reformated Dataset with 1K Samples --> https://huggingface.co/datasets/mlabonne/guanaco-llama2-1k
Complete Reformated Datset --> https://huggingface.co/datasets/mlabonne/guanaco-llama2
To drastically reduce the VRAM usage, we must fine-tune the model in 4-bit precision, which is why we’ll use QLoRA here and the GPU on which the model was fined tuned on was L4 (Google Colab Pro)
Process
- Load the dataset as defined.
- Configure bitsandbytes for 4-bit quantization.
- Load the Llama 2 model in 4-bit precision on a GPU (L4 - Google Colab Pro) with the corresponding tokenizer.
- Loading configurations for QLoRA, regular training parameters, and pass everything to the SFTTrainer.
- Fine Tuning Starts...