ricardo-larosa commited on
Commit
896b902
1 Parent(s): fa17e9d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +12 -0
README.md CHANGED
@@ -20,3 +20,15 @@ base_model: unsloth/mistral-7b-instruct-v0.2-bnb-4bit
20
  This mistral model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
21
 
22
  [<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)
 
 
 
 
 
 
 
 
 
 
 
 
 
20
  This mistral model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
21
 
22
  [<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)
23
+ # Techniques used
24
+ 1. Quantization: They provide 4-bit quantized models which are 4x faster to download and use 4x less memory (I observed that the reduction of precision did not affect too much the performance of the model).
25
+ 2. Lower Ranking Adaptation: They provide LoRA adapters which allow to only update 1 to 10% of all parameters (I am glad of all the SVG exercises in CS224N!). paper: https://arxiv.org/pdf/2106.09685
26
+ 3. Rotary Positional Embedding Scaling: They support RoPE Scaling internally instead of traditional positional embeddings. paper: https://arxiv.org/abs/2104.09864
27
+
28
+ # Performance
29
+ I did not see any OOMs and the memory usage was steady at 10GB on a A100 GPU (I could've easily used a V100).
30
+ Aditional to this performance optimizations, I spend some time tweaking the parameters of the Supervised Fine-tuning Trainer (SFTTrainer) from the TRL library.
31
+
32
+ # Prompting
33
+ Finally, the prompt template is a simple alpaca-like template of fields: instruction, english_sentence and logical_form. The same template is used for training and inference.
34
+