AjayMukundS commited on
Commit
fc46b6c
1 Parent(s): 384ab79

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +17 -8
README.md CHANGED
@@ -20,14 +20,11 @@ tags:
20
  - text-generation-inference
21
  ---
22
 
23
- # My Model Name
24
- Llama-2-7b-chat-finetune
25
 
26
  ## Model Description
27
- This is a Llama 2 Fine Tuned Model with 7 Billion Parameters on the 1K sample Dataset from **mlabonne/guanaco-llama2-1k**
28
-
29
- ## Training Data
30
- The training data is basically a Chat between a Human and an Assistant where the Human poses some queries and the Assistant responds to those queries in a suitable fashion.
31
  In the case of Llama 2, the following Chat Template is used for the chat models:
32
 
33
  **[INST] SYSTEM PROMPT**
@@ -39,5 +36,17 @@ User prompt (required) --> to give the instruction / User Query
39
 
40
  Model Answer (required)
41
 
42
- ## Evaluation
43
- Details about evaluation metrics and results.
 
 
 
 
 
 
 
 
 
 
 
 
 
20
  - text-generation-inference
21
  ---
22
 
23
+ # Deployed Model
24
+ AjayMukundS/Llama-2-7b-chat-finetune
25
 
26
  ## Model Description
27
+ This is a Llama 2 Fine Tuned Model with 7 Billion Parameters on the Dataset from **mlabonne/guanaco-llama2**. The training data is basically a Chat between a Human and an Assistant where the Human poses some queries and the Assistant responds to those queries in a suitable fashion.
 
 
 
28
  In the case of Llama 2, the following Chat Template is used for the chat models:
29
 
30
  **[INST] SYSTEM PROMPT**
 
36
 
37
  Model Answer (required)
38
 
39
+ ## Training Data
40
+ The Instruction Dataset is reformated to follow the above Llama 2 template.
41
+ **Original Dataset** --> https://huggingface.co/datasets/timdettmers/openassistant-guanaco
42
+ **Reformated Dataset with 1K Samples** --> https://huggingface.co/datasets/mlabonne/guanaco-llama2-1k
43
+ **Complete Reformated Datset** --> https://huggingface.co/datasets/mlabonne/guanaco-llama2
44
+
45
+ To drastically reduce the VRAM usage, we must fine-tune the model in 4-bit precision, which is why we’ll use QLoRA here and the GPU on which the model was fined tuned on was **L4 (Google Colab Pro)**
46
+
47
+ ## Process
48
+ 1) Load the dataset as defined.
49
+ 2) Configure bitsandbytes for 4-bit quantization.
50
+ 3) Load the Llama 2 model in 4-bit precision on a GPU (L4 - Google Colab Pro) with the corresponding tokenizer.
51
+ 4) Loading configurations for QLoRA, regular training parameters, and pass everything to the SFTTrainer.
52
+ 5) Fine Tuning Starts...