LearnItAnyway commited on
Commit
c3fb936
1 Parent(s): b2d3af8

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +40 -0
README.md CHANGED
@@ -1,3 +1,43 @@
1
  ---
2
  license: other
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: other
3
  ---
4
+ # Model Card for llama-30b-hf-53q_4bit-128g_WVU
5
+
6
+ ## Model Description
7
+
8
+ `llama-30b-hf-53q_4bit-128g_WVU` is a model based on the
9
+ Llama architecture with 30 billion parameters.
10
+ This model adopts a quantization in which the first 53 layers
11
+ of the decoder have been quantized with the [`gptq`](https://github.com/qwopqwop200/GPTQ-for-LLaMa) method,
12
+ which uses 4-bit precision and 128 groups.
13
+ Then, the last 7 decoder layers (1/8 of decoding layers), and lm_head have been fine-tuned using the [wizard_vicuna_70k_unfiltered dataset](https://huggingface.co/datasets/ehartford/wizard_vicuna_70k_unfiltered), 1 epoch.
14
+
15
+ ## Note
16
+
17
+ Quantization effectively reduces memory usage, however, it may result in differences in the parameters.
18
+ Additionally, fine-tuning only the last few layers lowers memory requirements for training but could lead to minor performance degradation.
19
+
20
+ Several alternatives exist for fine-tuning and quantizing the Llama models. The specific method utilized here—quantizing several layers,
21
+ followed by fine-tuning the last few layers—is designed to account for errors introduced during quantization (which sometimes can result in unexpected answers),
22
+ and enables the last few layers to be fine-tuned considering both the quantization error and the dataset.
23
+
24
+ It is worth mentioning that other methods may yield superior performance. For instance:
25
+ 1. Fine-tuning the entire model for `X` epochs
26
+ 2. Quantizing the first `K` layers
27
+ 3. Fine-tuning the remaining layers for `Y` epochs
28
+
29
+ Nonetheless, as fine-tuning the entire model requires considerable resources (for example, 4 GPUs with 80GB VRAM is required for 7B LLaMa),
30
+ this model omit the first step from the method described above, and it works.
31
+
32
+ ## Using the Model
33
+
34
+ To load the model, a custom `LlamaForCausalLM` is required.
35
+ You can find quantized llama [here](https://github.com/LearnItAnyway/quantized_llama).
36
+
37
+ ## References
38
+
39
+ 1. Meta - LLaMA
40
+ 2. [WizardLM](https://github.com/nlpxucan/WizardLM)
41
+ 3. [GPTQ for LLaMa](https://github.com/qwopqwop200/GPTQ-for-LLaMa)
42
+ 4. [Wizard Vicuna Unfiltered Dataset](https://huggingface.co/datasets/ehartford/wizard_vicuna_70k_unfiltered)
43
+ 5. Various unlisted but great works, researches, and projects.