nicholasKluge commited on
Commit
f643bcf
1 Parent(s): c91e1f8

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +20 -4
README.md CHANGED
@@ -31,16 +31,19 @@ co2_eq_emissions:
31
  geographical_location: Germany
32
  hardware_used: NVIDIA A100-SXM4-40GB
33
  ---
34
- # TeenyTinyLlama-460m
35
 
36
  <img src="./logo.png" alt="A curious llama exploring a mushroom forest." height="200">
37
 
38
  ## Model Summary
39
 
 
 
40
  Given the lack of available monolingual foundational models in non-English languages and the fact that some of the most used and downloaded models by the community are those small enough to allow individual researchers and hobbyists to use them in low-resource environments, we developed the TeenyTinyLlama: _a pair of small foundational models trained in Brazilian Portuguese._
41
 
42
  TeenyTinyLlama is a compact language model based on the Llama 2 architecture ([TinyLlama implementation](https://huggingface.co/TinyLlama)). This model is designed to deliver efficient natural language processing capabilities while being resource-conscious These models were trained by leveraging [scaling laws](https://arxiv.org/abs/2203.15556) to determine the optimal number of tokens per parameter while incorporating [preference pre-training](https://arxiv.org/abs/2112.00861).
43
 
 
44
  ## Details
45
 
46
  - **Architecture:** a Transformer-based model pre-trained via causal language modeling
@@ -53,6 +56,12 @@ TeenyTinyLlama is a compact language model based on the Llama 2 architecture ([T
53
  - **Training time**: ~ 280 hours
54
  - **Emissions:** 41.1 KgCO2 (Germany)
55
  - **Total energy consumption:** 115.69 kWh
 
 
 
 
 
 
56
 
57
  This repository has the [source code](https://github.com/Nkluge-correa/Aira) used to train this model. The main libraries used are:
58
 
@@ -63,6 +72,7 @@ This repository has the [source code](https://github.com/Nkluge-correa/Aira) use
63
  - [Sentencepiece](https://github.com/google/sentencepiece)
64
  - [Accelerate](https://github.com/huggingface/accelerate)
65
  - [Codecarbon](https://github.com/mlco2/codecarbon)
 
66
 
67
  Check out the training logs in [Weights and Biases](https://api.wandb.ai/links/nkluge-correa/vws4g032).
68
 
@@ -104,12 +114,16 @@ The primary intended use of TeenyTinyLlama is to research the behavior, function
104
 
105
  ## Basic usage
106
 
 
 
107
  Using the `pipeline`:
108
 
109
  ```python
 
 
110
  from transformers import pipeline
111
 
112
- generator = pipeline("text-generation", model="nicholasKluge/TeenyTinyLlama-460m")
113
 
114
  completions = generator("Astronomia é a ciência", num_return_sequences=2, max_new_tokens=100)
115
 
@@ -120,12 +134,14 @@ for comp in completions:
120
  Using the `AutoTokenizer` and `AutoModelForCausalLM`:
121
 
122
  ```python
 
 
123
  from transformers import AutoTokenizer, AutoModelForCausalLM
124
  import torch
125
 
126
  # Load model and the tokenizer
127
- tokenizer = AutoTokenizer.from_pretrained("nicholasKluge/TeenyTinyLlama-460m", revision='main')
128
- model = AutoModelForCausalLM.from_pretrained("nicholasKluge/TeenyTinyLlama-460m", revision='main')
129
 
130
  # Pass the model to your device
131
  device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
 
31
  geographical_location: Germany
32
  hardware_used: NVIDIA A100-SXM4-40GB
33
  ---
34
+ # TeenyTinyLlama-460m-awq
35
 
36
  <img src="./logo.png" alt="A curious llama exploring a mushroom forest." height="200">
37
 
38
  ## Model Summary
39
 
40
+ **Note: This model is a quantized version of [TeenyTinyLlama-460m](https://huggingface.co/nicholasKluge/TeenyTinyLlama-460m). Quantization was performed using [AutoAWQ](https://github.com/casper-hansen/AutoAWQ), allowing this version to be 80% lighter, with almost no performance loss.**
41
+
42
  Given the lack of available monolingual foundational models in non-English languages and the fact that some of the most used and downloaded models by the community are those small enough to allow individual researchers and hobbyists to use them in low-resource environments, we developed the TeenyTinyLlama: _a pair of small foundational models trained in Brazilian Portuguese._
43
 
44
  TeenyTinyLlama is a compact language model based on the Llama 2 architecture ([TinyLlama implementation](https://huggingface.co/TinyLlama)). This model is designed to deliver efficient natural language processing capabilities while being resource-conscious These models were trained by leveraging [scaling laws](https://arxiv.org/abs/2203.15556) to determine the optimal number of tokens per parameter while incorporating [preference pre-training](https://arxiv.org/abs/2112.00861).
45
 
46
+
47
  ## Details
48
 
49
  - **Architecture:** a Transformer-based model pre-trained via causal language modeling
 
56
  - **Training time**: ~ 280 hours
57
  - **Emissions:** 41.1 KgCO2 (Germany)
58
  - **Total energy consumption:** 115.69 kWh
59
+ - **Quantization Configuration:**
60
+ - `bits`: 4
61
+ - `group_size`: 128
62
+ - `quant_method`: "awq"
63
+ - `version`: "gemm"
64
+ - `zero_point`: True
65
 
66
  This repository has the [source code](https://github.com/Nkluge-correa/Aira) used to train this model. The main libraries used are:
67
 
 
72
  - [Sentencepiece](https://github.com/google/sentencepiece)
73
  - [Accelerate](https://github.com/huggingface/accelerate)
74
  - [Codecarbon](https://github.com/mlco2/codecarbon)
75
+ - [AutoAWQ](https://github.com/casper-hansen/AutoAWQ)
76
 
77
  Check out the training logs in [Weights and Biases](https://api.wandb.ai/links/nkluge-correa/vws4g032).
78
 
 
114
 
115
  ## Basic usage
116
 
117
+ **Note: The use of quantized models required the installation of `autoawq==0.1.7`.**
118
+
119
  Using the `pipeline`:
120
 
121
  ```python
122
+ !pip install autoawq==0.1.7 -q
123
+
124
  from transformers import pipeline
125
 
126
+ generator = pipeline("text-generation", model="nicholasKluge/TeenyTinyLlama-460m-awq")
127
 
128
  completions = generator("Astronomia é a ciência", num_return_sequences=2, max_new_tokens=100)
129
 
 
134
  Using the `AutoTokenizer` and `AutoModelForCausalLM`:
135
 
136
  ```python
137
+ !pip install autoawq==0.1.7 -q
138
+
139
  from transformers import AutoTokenizer, AutoModelForCausalLM
140
  import torch
141
 
142
  # Load model and the tokenizer
143
+ tokenizer = AutoTokenizer.from_pretrained("nicholasKluge/TeenyTinyLlama-460m-awq", revision='main')
144
+ model = AutoModelForCausalLM.from_pretrained("nicholasKluge/TeenyTinyLlama-460m-awq", revision='main')
145
 
146
  # Pass the model to your device
147
  device = torch.device("cuda" if torch.cuda.is_available() else "cpu")