Text Generation
Transformers
Safetensors
English
llama
Llama-3-6B
6B
Inference Endpoints
text-generation-inference
prince-canuma commited on
Commit
82d3500
1 Parent(s): 7aa47ee

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +163 -76
README.md CHANGED
@@ -7,17 +7,13 @@ datasets:
7
  - prince-canuma/fineweb-CC-MAIN-2024-10-1B-en
8
  ---
9
 
10
- # Model Card for Model ID
11
-
12
- <!-- Provide a quick summary of what the model is/does. -->
13
-
14
-
15
-
16
- ## Model Details
17
-
18
- ### Model Description
19
  <img src="llama-3-6B icon.jpeg" width="500" alt="Llama-3-6B"/>
20
 
 
 
 
 
21
  <!-- Provide a longer summary of what this model is. -->
22
 
23
  This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
@@ -30,61 +26,72 @@ This is the model card of a 🤗 transformers model that has been pushed on the
30
  - **License:** [More Information Needed]
31
  - **Finetuned from model [optional]:** [More Information Needed]
32
 
 
 
 
 
 
 
 
 
 
 
33
  ### Model Sources [optional]
34
 
35
  <!-- Provide the basic links for the model. -->
36
 
37
- - **Repository:** [More Information Needed]
38
- - **Paper [optional]:** [More Information Needed]
39
- - **Demo [optional]:** [More Information Needed]
40
 
41
  ## Uses
42
 
43
  <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
 
44
 
45
- ### Direct Use
46
-
47
- <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
48
-
49
- [More Information Needed]
50
-
51
- ### Downstream Use [optional]
52
-
53
- <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
54
-
55
- [More Information Needed]
56
 
57
- ### Out-of-Scope Use
58
-
59
- <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
60
-
61
- [More Information Needed]
62
 
63
- ## Bias, Risks, and Limitations
64
 
65
- <!-- This section is meant to convey both technical and sociotechnical limitations. -->
66
 
67
- [More Information Needed]
 
68
 
69
- ### Recommendations
 
 
 
70
 
71
- <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
 
 
 
72
 
73
- Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
 
 
74
 
75
- ## How to Get Started with the Model
76
 
77
- Use the code below to get started with the model.
 
 
 
 
 
 
 
78
 
79
- [More Information Needed]
80
 
81
  ## Training Details
82
 
83
  ### Training Data
84
 
85
- <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
86
-
87
- [More Information Needed]
88
 
89
  ### Training Procedure
90
 
@@ -95,15 +102,45 @@ Use the code below to get started with the model.
95
  [More Information Needed]
96
 
97
 
98
- #### Training Hyperparameters
99
 
100
- - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
 
 
 
 
 
 
 
 
 
 
 
 
 
101
 
102
- #### Speeds, Sizes, Times [optional]
103
 
104
- <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
105
 
106
- [More Information Needed]
107
 
108
  ## Evaluation
109
 
@@ -136,42 +173,12 @@ Use the code below to get started with the model.
136
  #### Summary
137
 
138
 
139
-
140
  ## Model Examination [optional]
141
 
142
  <!-- Relevant interpretability work for the model goes here -->
143
 
144
  [More Information Needed]
145
 
146
- ## Environmental Impact
147
-
148
- <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
149
-
150
- Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
151
-
152
- - **Hardware Type:** [More Information Needed]
153
- - **Hours used:** [More Information Needed]
154
- - **Cloud Provider:** [More Information Needed]
155
- - **Compute Region:** [More Information Needed]
156
- - **Carbon Emitted:** [More Information Needed]
157
-
158
- ## Technical Specifications [optional]
159
-
160
- ### Model Architecture and Objective
161
-
162
- [More Information Needed]
163
-
164
- ### Compute Infrastructure
165
-
166
- [More Information Needed]
167
-
168
- #### Hardware
169
-
170
- [More Information Needed]
171
-
172
- #### Software
173
-
174
- [More Information Needed]
175
 
176
  ## Citation [optional]
177
 
@@ -185,4 +192,84 @@ Carbon emissions can be estimated using the [Machine Learning Impact calculator]
185
  author={Prince Canuma},
186
  year={2024},
187
  }
188
- ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7
  - prince-canuma/fineweb-CC-MAIN-2024-10-1B-en
8
  ---
9
 
10
+ # Model Summary
 
 
 
 
 
 
 
 
11
  <img src="llama-3-6B icon.jpeg" width="500" alt="Llama-3-6B"/>
12
 
13
+ This is world's first Llama-3 base model with 6B params, it is a pretrained version of [prince-canuma/Llama-3-6B-v0](https://huggingface.co/prince-canuma/Llama-3-6B-v0) which was, downcycled from Meta-Llama-3-8B.
14
+ It was continually pretrained on 1B tokens of enlish only text from fineweb and achieves the following results on the evaluation set:
15
+ - Loss: 2.4942
16
+
17
  <!-- Provide a longer summary of what this model is. -->
18
 
19
  This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
 
26
  - **License:** [More Information Needed]
27
  - **Finetuned from model [optional]:** [More Information Needed]
28
 
29
+ ## Model Description
30
+
31
+ <!-- Provide a longer summary of what this model is. -->
32
+ This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
33
+
34
+ - **Developed by:** [Prince Canuma](https://huggingface.co/prince-canuma)
35
+ - **Model type:** Transformer
36
+ - **License:** MIT
37
+ - **Finetuned from model:** prince-canuma/Llama-3-6B-v0
38
+
39
  ### Model Sources [optional]
40
 
41
  <!-- Provide the basic links for the model. -->
42
 
43
+ - **Repository:** https://github.com/Blaizzy/Coding-LLMs-from-scratch/tree/main/Llama-3
44
+ - **Video [optional]:** https://youtube.com/playlist?list=PLDn_JsyofyfTH5_5V1MNb8UYKxMl6IMNy&si=5Y4cm-6wrMOD1Abr
 
45
 
46
  ## Uses
47
 
48
  <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
49
+ You can use this model to create instruct and chat versions for various use cases such as: Coding assistant, RAG, Function Calling and more.
50
 
51
+ ### Limitations
 
 
 
 
 
 
 
 
 
 
52
 
53
+ This model inherits some of the base model's limitations and some additional ones from it's creation process, such as:
54
+ - Limited scope for coding and math: According to benchmarks, this model needs more pretraining/finetuning on code and math data to excel at reasoning tasks.
55
+ - Language Limitations: This model was continually pretrained on english only data. If you are planning to use it for multilingual use cases I recommend fine-tuning or continued pretraining.
 
 
56
 
57
+ ## How to Get Started with the Model
58
 
59
+ Use the code below to get started with the model.
60
 
61
+ ```python
62
+ from transformers import AutoModelForCausalLM, AutoConfig, AutoTokenizer
63
 
64
+ # Load model, config and tokenizer
65
+ model_name = "prince-canuma/Llama-3-6B-v0.1"
66
+ model = AutoModelForCausalLM.from_pretrained(model_name)
67
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
68
 
69
+ inputs = tokenizer(
70
+ [
71
+ "Who created Python?"
72
+ ], return_tensors = "pt")
73
 
74
+ from transformers import TextStreamer
75
+ text_streamer = TextStreamer(tokenizer)
76
+ _ = model.generate(**inputs, streamer = text_streamer, max_new_tokens = 200)
77
 
78
+ ```
79
 
80
+ Output:
81
+ ```shell
82
+ <|begin_of_text|>Who created Python? What is Python used for? What is the difference between Python 2 and Python 3? What is the difference between Python and Python 3?
83
+ Python is a programming language that was created by Guido van Rossum in 1991. It is a widely used language for web development, data science, and machine learning. Python is also used for creating software applications and games.
84
+ Python is a powerful language that is easy to learn and use. It has a large library of built-in functions and packages that make it easy to write code. Python is also a very popular language for web development, with many popular web frameworks such as Django and Flask being written in Python.
85
+ Python is also used for data science and machine learning. It has a large library of packages for data analysis, machine learning, and artificial intelligence. Python is also used for creating software applications and games.
86
+ Python 2 and Python 3 are two different versions of the Python language. Python 2 was the original version of the
87
+ ```
88
 
 
89
 
90
  ## Training Details
91
 
92
  ### Training Data
93
 
94
+ For continued pretrained, I extracted 1B tokens from [Huggingface's FineWeb CC-Main-2024-10](https://huggingface.co/datasets/HuggingFaceFW/fineweb#breakdown-by-dumpcrawl) slice.
 
 
95
 
96
  ### Training Procedure
97
 
 
102
  [More Information Needed]
103
 
104
 
105
+ #### Training hyperparameters
106
 
107
+ The following hyperparameters were used during training:
108
+ - learning_rate: 0.0002
109
+ - train_batch_size: 2
110
+ - eval_batch_size: 2
111
+ - seed: 42
112
+ - distributed_type: multi-GPU
113
+ - num_devices: 4
114
+ - gradient_accumulation_steps: 8
115
+ - total_train_batch_size: 64
116
+ - total_eval_batch_size: 8
117
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
118
+ - lr_scheduler_type: cosine
119
+ - lr_scheduler_warmup_steps: 100
120
+ - num_epochs: 2
121
 
122
+ ### Training results
123
 
124
+ | Training Loss | Epoch | Step | Validation Loss |
125
+ |:-------------:|:-----:|:-----:|:---------------:|
126
+ | 7.1562 | 0.0 | 1 | 7.1806 |
127
+ | 2.7339 | 0.25 | 5867 | 2.6266 |
128
+ | 2.6905 | 0.5 | 11734 | 2.5872 |
129
+ | 2.6134 | 0.75 | 17601 | 2.5549 |
130
+ | 2.532 | 1.0 | 23468 | 2.5235 |
131
+ | 2.5319 | 1.25 | 29335 | 2.5067 |
132
+ | 2.3336 | 1.5 | 35202 | 2.4968 |
133
+ | 2.3486 | 1.75 | 41069 | 2.4942 |
134
+
135
+
136
+ ### Framework versions
137
+
138
+ - PEFT 0.10.0
139
+ - Transformers 4.40.0.dev0
140
+ - Pytorch 2.2.0+cu121
141
+ - Datasets 2.15.0
142
+ - Tokenizers 0.15.0
143
 
 
144
 
145
  ## Evaluation
146
 
 
173
  #### Summary
174
 
175
 
 
176
  ## Model Examination [optional]
177
 
178
  <!-- Relevant interpretability work for the model goes here -->
179
 
180
  [More Information Needed]
181
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
182
 
183
  ## Citation [optional]
184
 
 
192
  author={Prince Canuma},
193
  year={2024},
194
  }
195
+ ```
196
+
197
+ [<img src="https://raw.githubusercontent.com/OpenAccess-AI-Collective/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/OpenAccess-AI-Collective/axolotl)
198
+ <details><summary>See axolotl config</summary>
199
+
200
+ axolotl version: `0.4.0`
201
+ ```yaml
202
+ base_model: prince-canuma/Llama-3-6B-v0.1
203
+ model_type: AutoModelForCausalLM
204
+ tokenizer_type: AutoTokenizer
205
+
206
+ load_in_8bit: false
207
+ load_in_4bit: true
208
+ strict: false
209
+
210
+ datasets:
211
+ - path: prince-canuma/fineweb-CC-MAIN-2024-10-1B-en
212
+ type: completion
213
+ split: train
214
+ dataset_prepared_path: last_run_prepared
215
+ val_set_size: 0.001
216
+ output_dir: ./llama-3-6b
217
+ save_safetensors: true
218
+ adapter: qlora
219
+ lora_model_dir:
220
+
221
+ sequence_len: 8192
222
+ sample_packing: false
223
+ pad_to_sequence_len: false
224
+
225
+ lora_r: 128
226
+ lora_alpha: 128
227
+ lora_dropout: 0.05
228
+ lora_target_modules:
229
+ lora_target_linear: true
230
+ lora_fan_in_fan_out:
231
+
232
+
233
+ wandb_project: llama-3-6b
234
+ wandb_entity:
235
+ wandb_watch:
236
+ wandb_name:
237
+ wandb_log_model:
238
+
239
+ gradient_accumulation_steps: 8
240
+ micro_batch_size: 2
241
+ num_epochs: 2
242
+ optimizer: paged_adamw_32bit
243
+ lr_scheduler: cosine
244
+ learning_rate: 2e-4
245
+
246
+ train_on_inputs: false
247
+ group_by_length: false
248
+ bf16: auto
249
+ fp16:
250
+ tf32: false
251
+
252
+ gradient_checkpointing: true
253
+ early_stopping_patience:
254
+ resume_from_checkpoint:
255
+ local_rank:
256
+ logging_steps: 1
257
+ xformers_attention:
258
+ flash_attention: true
259
+
260
+ warmup_steps: 100
261
+ evals_per_epoch: 4
262
+ eval_table_size:
263
+ save_steps: 4000
264
+ debug:
265
+ deepspeed:
266
+ weight_decay: 0.0
267
+ fsdp:
268
+ fsdp_config:
269
+ special_tokens:
270
+ pad_token: "<|reserved_special_token_0|>"
271
+
272
+
273
+ ```
274
+
275
+ </details><br>