nicholasKluge commited on
Commit
748c8f7
1 Parent(s): 69794df

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +14 -3
README.md CHANGED
@@ -43,7 +43,9 @@ co2_eq_emissions:
43
  geographical_location: United States of America
44
  hardware_used: NVIDIA A100-SXM4-40GB
45
  ---
46
- # TeenyTinyLlama-460m-Chat
 
 
47
 
48
  TeenyTinyLlama is a pair of small foundational models trained in Brazilian Portuguese.
49
 
@@ -55,17 +57,26 @@ This repository contains a version of [TeenyTinyLlama-460m](https://huggingface.
55
  - **Batch size:** 4
56
  - **Optimizer:** `torch.optim.AdamW` (warmup_steps = 1e3, learning_rate = 1e-5, epsilon = 1e-8)
57
  - **GPU:** 1 NVIDIA A100-SXM4-40GB
58
- - **Carbon emissions** stats are logged in this [file](emissions.csv).
 
 
 
 
 
59
 
60
- This repository has the [source code](https://github.com/Nkluge-correa/Aira) used to train this model.
61
 
62
  ## Usage
63
 
 
 
64
  The following special tokens are used to mark the user side of the interaction and the model's response:
65
 
66
  `<instruction>`What is a language model?`</instruction>`A language model is a probability distribution over a vocabulary.`</s>`
67
 
68
  ```python
 
 
69
  from transformers import AutoTokenizer, AutoModelForCausalLM
70
  import torch
71
 
 
43
  geographical_location: United States of America
44
  hardware_used: NVIDIA A100-SXM4-40GB
45
  ---
46
+ # TeenyTinyLlama-460m-Chat-awq
47
+
48
+ **Note: This model is a quantized version of [TeenyTinyLlama-460m-Chat](https://huggingface.co/nicholasKluge/TeenyTinyLlama-460m-Chat). Quantization was performed using [AutoAWQ](https://github.com/casper-hansen/AutoAWQ), allowing this version to be 80% lighter with almost no performance loss. A GPU is required to run the AWQ-quantized models.**
49
 
50
  TeenyTinyLlama is a pair of small foundational models trained in Brazilian Portuguese.
51
 
 
57
  - **Batch size:** 4
58
  - **Optimizer:** `torch.optim.AdamW` (warmup_steps = 1e3, learning_rate = 1e-5, epsilon = 1e-8)
59
  - **GPU:** 1 NVIDIA A100-SXM4-40GB
60
+ - **Quantization Configuration:**
61
+ - `bits`: 4
62
+ - `group_size`: 128
63
+ - `quant_method`: "awq"
64
+ - `version`: "gemm"
65
+ - `zero_point`: True
66
 
67
+ This repository has the [source code](https://github.com/Nkluge-correa/TeenyTinyLlama) used to train this model.
68
 
69
  ## Usage
70
 
71
+ **Note: Using quantized models required the installation of `autoawq==0.1.7`. A GPU is required to run the AWQ-quantized models.**
72
+
73
  The following special tokens are used to mark the user side of the interaction and the model's response:
74
 
75
  `<instruction>`What is a language model?`</instruction>`A language model is a probability distribution over a vocabulary.`</s>`
76
 
77
  ```python
78
+ !pip install autoawq==0.1.7 -q
79
+
80
  from transformers import AutoTokenizer, AutoModelForCausalLM
81
  import torch
82