--- license: mit datasets: - mlabonne/guanaco-llama2-1k language: - en metrics: - bleu library_name: adapter-transformers tags: - chemistry - biology - finance - legal - music - art - code - climate - medical - text-generation-inference --- # Deployed Model AjayMukundS/Llama-2-7b-chat-finetune ## Model Description This is a Llama 2 Fine Tuned Model with 7 Billion Parameters on the Dataset from **mlabonne/guanaco-llama2**. The training data is basically a Chat between a Human and an Assistant where the Human poses some queries and the Assistant responds to those queries in a suitable fashion. In the case of Llama 2, the following Chat Template is used for the chat models: **[INST] SYSTEM PROMPT** **User Prompt [/INST] Model Answer** System Prompt (optional) --> to guide the model User prompt (required) --> to give the instruction / User Query Model Answer (required) ## Training Data The Instruction Dataset is reformated to follow the above Llama 2 template. **Original Dataset** --> https://huggingface.co/datasets/timdettmers/openassistant-guanaco\ **Reformated Dataset with 1K Samples** --> https://huggingface.co/datasets/mlabonne/guanaco-llama2-1k **Complete Reformated Datset** --> https://huggingface.co/datasets/mlabonne/guanaco-llama2 To drastically reduce the VRAM usage, we must fine-tune the model in 4-bit precision, which is why we’ll use QLoRA here and the GPU on which the model was fined tuned on was **L4 (Google Colab Pro)** ## Process 1) Load the dataset as defined. 2) Configure bitsandbytes for 4-bit quantization. 3) Load the Llama 2 model in 4-bit precision on a GPU (L4 - Google Colab Pro) with the corresponding tokenizer. 4) Loading configurations for QLoRA, regular training parameters, and pass everything to the SFTTrainer. 5) Fine Tuning Starts...