AjayMukundS
/

Llama-2-7b-chat-finetune

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

AjayMukundS commited on May 21

Commit

fc46b6c

•

1 Parent(s): 384ab79

Update README.md

Files changed (1) hide show

README.md +17 -8

README.md CHANGED Viewed

@@ -20,14 +20,11 @@ tags:
 - text-generation-inference
 ---
-# My Model Name
-Llama-2-7b-chat-finetune
 ## Model Description
-This is a Llama 2 Fine Tuned Model with 7 Billion Parameters on the 1K sample Dataset from **mlabonne/guanaco-llama2-1k**
-## Training Data
-The training data is basically a Chat between a Human and an Assistant where the Human poses some queries and the Assistant responds to those queries in a suitable fashion.
 In the case of Llama 2, the following Chat Template is used for the chat models:
 **[INST] SYSTEM PROMPT**
@@ -39,5 +36,17 @@ User prompt (required) --> to give the instruction / User Query
 Model Answer (required)
-## Evaluation
-Details about evaluation metrics and results.

 - text-generation-inference
 ---
+# Deployed Model
+AjayMukundS/Llama-2-7b-chat-finetune
 ## Model Description
+This is a Llama 2 Fine Tuned Model with 7 Billion Parameters on the Dataset from **mlabonne/guanaco-llama2**. The training data is basically a Chat between a Human and an Assistant where the Human poses some queries and the Assistant responds to those queries in a suitable fashion.
 In the case of Llama 2, the following Chat Template is used for the chat models:
 **[INST] SYSTEM PROMPT**
 Model Answer (required)
+## Training Data
+The Instruction Dataset is reformated to follow the above Llama 2 template.
+**Original Dataset** --> https://huggingface.co/datasets/timdettmers/openassistant-guanaco
+**Reformated Dataset with 1K Samples** -->  https://huggingface.co/datasets/mlabonne/guanaco-llama2-1k
+**Complete Reformated Datset** --> https://huggingface.co/datasets/mlabonne/guanaco-llama2
+To drastically reduce the VRAM usage, we must fine-tune the model in 4-bit precision, which is why we’ll use QLoRA here and the GPU on which the model was fined tuned on was **L4 (Google Colab Pro)**
+## Process
+1) Load the dataset as defined.
+2) Configure bitsandbytes for 4-bit quantization.
+3) Load the Llama 2 model in 4-bit precision on a GPU (L4 - Google Colab Pro) with the corresponding tokenizer.
+4) Loading configurations for QLoRA, regular training parameters, and pass everything to the SFTTrainer.
+5) Fine Tuning Starts...