ricardo-larosa commited on
Commit
7a4ba7b
1 Parent(s): 896b902

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -22,8 +22,8 @@ This mistral model was trained 2x faster with [Unsloth](https://github.com/unslo
22
  [<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)
23
  # Techniques used
24
  1. Quantization: They provide 4-bit quantized models which are 4x faster to download and use 4x less memory (I observed that the reduction of precision did not affect too much the performance of the model).
25
- 2. Lower Ranking Adaptation: They provide LoRA adapters which allow to only update 1 to 10% of all parameters (I am glad of all the SVG exercises in CS224N!). paper: https://arxiv.org/pdf/2106.09685
26
- 3. Rotary Positional Embedding Scaling: They support RoPE Scaling internally instead of traditional positional embeddings. paper: https://arxiv.org/abs/2104.09864
27
 
28
  # Performance
29
  I did not see any OOMs and the memory usage was steady at 10GB on a A100 GPU (I could've easily used a V100).
 
22
  [<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)
23
  # Techniques used
24
  1. Quantization: They provide 4-bit quantized models which are 4x faster to download and use 4x less memory (I observed that the reduction of precision did not affect too much the performance of the model).
25
+ 2. Lower Ranking Adaptation: They provide LoRA adapters which allow to only update 1 to 10% of all parameters.
26
+ 3. Rotary Positional Embedding Scaling: They support RoPE Scaling internally instead of traditional positional embeddings.
27
 
28
  # Performance
29
  I did not see any OOMs and the memory usage was steady at 10GB on a A100 GPU (I could've easily used a V100).