AjayMukundS
commited on
Commit
•
fc46b6c
1
Parent(s):
384ab79
Update README.md
Browse files
README.md
CHANGED
@@ -20,14 +20,11 @@ tags:
|
|
20 |
- text-generation-inference
|
21 |
---
|
22 |
|
23 |
-
#
|
24 |
-
Llama-2-7b-chat-finetune
|
25 |
|
26 |
## Model Description
|
27 |
-
This is a Llama 2 Fine Tuned Model with 7 Billion Parameters on the
|
28 |
-
|
29 |
-
## Training Data
|
30 |
-
The training data is basically a Chat between a Human and an Assistant where the Human poses some queries and the Assistant responds to those queries in a suitable fashion.
|
31 |
In the case of Llama 2, the following Chat Template is used for the chat models:
|
32 |
|
33 |
**[INST] SYSTEM PROMPT**
|
@@ -39,5 +36,17 @@ User prompt (required) --> to give the instruction / User Query
|
|
39 |
|
40 |
Model Answer (required)
|
41 |
|
42 |
-
##
|
43 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
20 |
- text-generation-inference
|
21 |
---
|
22 |
|
23 |
+
# Deployed Model
|
24 |
+
AjayMukundS/Llama-2-7b-chat-finetune
|
25 |
|
26 |
## Model Description
|
27 |
+
This is a Llama 2 Fine Tuned Model with 7 Billion Parameters on the Dataset from **mlabonne/guanaco-llama2**. The training data is basically a Chat between a Human and an Assistant where the Human poses some queries and the Assistant responds to those queries in a suitable fashion.
|
|
|
|
|
|
|
28 |
In the case of Llama 2, the following Chat Template is used for the chat models:
|
29 |
|
30 |
**[INST] SYSTEM PROMPT**
|
|
|
36 |
|
37 |
Model Answer (required)
|
38 |
|
39 |
+
## Training Data
|
40 |
+
The Instruction Dataset is reformated to follow the above Llama 2 template.
|
41 |
+
**Original Dataset** --> https://huggingface.co/datasets/timdettmers/openassistant-guanaco
|
42 |
+
**Reformated Dataset with 1K Samples** --> https://huggingface.co/datasets/mlabonne/guanaco-llama2-1k
|
43 |
+
**Complete Reformated Datset** --> https://huggingface.co/datasets/mlabonne/guanaco-llama2
|
44 |
+
|
45 |
+
To drastically reduce the VRAM usage, we must fine-tune the model in 4-bit precision, which is why we’ll use QLoRA here and the GPU on which the model was fined tuned on was **L4 (Google Colab Pro)**
|
46 |
+
|
47 |
+
## Process
|
48 |
+
1) Load the dataset as defined.
|
49 |
+
2) Configure bitsandbytes for 4-bit quantization.
|
50 |
+
3) Load the Llama 2 model in 4-bit precision on a GPU (L4 - Google Colab Pro) with the corresponding tokenizer.
|
51 |
+
4) Loading configurations for QLoRA, regular training parameters, and pass everything to the SFTTrainer.
|
52 |
+
5) Fine Tuning Starts...
|