Text Generation
Transformers
English
llama
smol_llama
llama2
Eval Results
Inference Endpoints
text-generation-inference
blockblockblock commited on
Commit
a21bdf4
1 Parent(s): 825dc55

Upload folder using huggingface_hub

Browse files
README.md ADDED
@@ -0,0 +1,198 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ license: apache-2.0
5
+ tags:
6
+ - smol_llama
7
+ - llama2
8
+ datasets:
9
+ - JeanKaddour/minipile
10
+ - pszemraj/simple_wikipedia_LM
11
+ - mattymchen/refinedweb-3m
12
+ - BEE-spoke-data/knowledge-inoc-concat-v1
13
+ inference:
14
+ parameters:
15
+ max_new_tokens: 64
16
+ do_sample: true
17
+ temperature: 0.8
18
+ repetition_penalty: 1.05
19
+ no_repeat_ngram_size: 4
20
+ eta_cutoff: 0.0006
21
+ renormalize_logits: true
22
+ widget:
23
+ - text: My name is El Microondas the Wise, and
24
+ example_title: El Microondas
25
+ - text: Kennesaw State University is a public
26
+ example_title: Kennesaw State University
27
+ - text: Bungie Studios is an American video game developer. They are most famous for
28
+ developing the award winning Halo series of video games. They also made Destiny.
29
+ The studio was founded
30
+ example_title: Bungie
31
+ - text: The Mona Lisa is a world-renowned painting created by
32
+ example_title: Mona Lisa
33
+ - text: The Harry Potter series, written by J.K. Rowling, begins with the book titled
34
+ example_title: Harry Potter Series
35
+ - text: 'Question: I have cities, but no houses. I have mountains, but no trees. I
36
+ have water, but no fish. What am I?
37
+
38
+ Answer:'
39
+ example_title: Riddle
40
+ - text: The process of photosynthesis involves the conversion of
41
+ example_title: Photosynthesis
42
+ - text: Jane went to the store to buy some groceries. She picked up apples, oranges,
43
+ and a loaf of bread. When she got home, she realized she forgot
44
+ example_title: Story Continuation
45
+ - text: 'Problem 2: If a train leaves Station A at 9:00 AM and travels at 60 mph,
46
+ and another train leaves Station B at 10:00 AM and travels at 80 mph, when will
47
+ they meet if the distance between the stations is 300 miles?
48
+
49
+ To determine'
50
+ example_title: Math Problem
51
+ - text: In the context of computer programming, an algorithm is
52
+ example_title: Algorithm Definition
53
+ pipeline_tag: text-generation
54
+ model-index:
55
+ - name: smol_llama-220M-GQA
56
+ results:
57
+ - task:
58
+ type: text-generation
59
+ name: Text Generation
60
+ dataset:
61
+ name: AI2 Reasoning Challenge (25-Shot)
62
+ type: ai2_arc
63
+ config: ARC-Challenge
64
+ split: test
65
+ args:
66
+ num_few_shot: 25
67
+ metrics:
68
+ - type: acc_norm
69
+ value: 24.83
70
+ name: normalized accuracy
71
+ source:
72
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=BEE-spoke-data/smol_llama-220M-GQA
73
+ name: Open LLM Leaderboard
74
+ - task:
75
+ type: text-generation
76
+ name: Text Generation
77
+ dataset:
78
+ name: HellaSwag (10-Shot)
79
+ type: hellaswag
80
+ split: validation
81
+ args:
82
+ num_few_shot: 10
83
+ metrics:
84
+ - type: acc_norm
85
+ value: 29.76
86
+ name: normalized accuracy
87
+ source:
88
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=BEE-spoke-data/smol_llama-220M-GQA
89
+ name: Open LLM Leaderboard
90
+ - task:
91
+ type: text-generation
92
+ name: Text Generation
93
+ dataset:
94
+ name: MMLU (5-Shot)
95
+ type: cais/mmlu
96
+ config: all
97
+ split: test
98
+ args:
99
+ num_few_shot: 5
100
+ metrics:
101
+ - type: acc
102
+ value: 25.85
103
+ name: accuracy
104
+ source:
105
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=BEE-spoke-data/smol_llama-220M-GQA
106
+ name: Open LLM Leaderboard
107
+ - task:
108
+ type: text-generation
109
+ name: Text Generation
110
+ dataset:
111
+ name: TruthfulQA (0-shot)
112
+ type: truthful_qa
113
+ config: multiple_choice
114
+ split: validation
115
+ args:
116
+ num_few_shot: 0
117
+ metrics:
118
+ - type: mc2
119
+ value: 44.55
120
+ source:
121
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=BEE-spoke-data/smol_llama-220M-GQA
122
+ name: Open LLM Leaderboard
123
+ - task:
124
+ type: text-generation
125
+ name: Text Generation
126
+ dataset:
127
+ name: Winogrande (5-shot)
128
+ type: winogrande
129
+ config: winogrande_xl
130
+ split: validation
131
+ args:
132
+ num_few_shot: 5
133
+ metrics:
134
+ - type: acc
135
+ value: 50.99
136
+ name: accuracy
137
+ source:
138
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=BEE-spoke-data/smol_llama-220M-GQA
139
+ name: Open LLM Leaderboard
140
+ - task:
141
+ type: text-generation
142
+ name: Text Generation
143
+ dataset:
144
+ name: GSM8k (5-shot)
145
+ type: gsm8k
146
+ config: main
147
+ split: test
148
+ args:
149
+ num_few_shot: 5
150
+ metrics:
151
+ - type: acc
152
+ value: 0.68
153
+ name: accuracy
154
+ source:
155
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=BEE-spoke-data/smol_llama-220M-GQA
156
+ name: Open LLM Leaderboard
157
+ ---
158
+
159
+
160
+ # smol_llama: 220M GQA
161
+
162
+ > model card WIP, more details to come
163
+
164
+
165
+ A small 220M param (total) decoder model. This is the first version of the model.
166
+
167
+ - 1024 hidden size, 10 layers
168
+ - GQA (32 heads, 8 key-value), context length 2048
169
+ - train-from-scratch on one GPU :)
170
+
171
+ ## Links
172
+
173
+ [Here](https://huggingface.co/collections/BEE-spoke-data/finetuned-smol-220m-65998b080ae723e79c830f83) are some fine-tunes we did, but there are many more possibilities out there!
174
+
175
+ - instruct
176
+ - openhermes - [link](https://huggingface.co/BEE-spoke-data/smol_llama-220M-openhermes)
177
+ - open-instruct - [link](https://huggingface.co/BEE-spoke-data/smol_llama-220M-open_instruct)
178
+ - code
179
+ - python (pypi) - [link](https://huggingface.co/BEE-spoke-data/beecoder-220M-python)
180
+ - zephyr DPO tune
181
+ - SFT - [link](https://huggingface.co/BEE-spoke-data/zephyr-220m-sft-full)
182
+ - full DPO - [link](https://huggingface.co/BEE-spoke-data/zephyr-220m-dpo-full)
183
+
184
+ ---
185
+
186
+ # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
187
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_BEE-spoke-data__smol_llama-220M-GQA)
188
+
189
+ | Metric |Value|
190
+ |---------------------------------|----:|
191
+ |Avg. |29.44|
192
+ |AI2 Reasoning Challenge (25-Shot)|24.83|
193
+ |HellaSwag (10-Shot) |29.76|
194
+ |MMLU (5-Shot) |25.85|
195
+ |TruthfulQA (0-shot) |44.55|
196
+ |Winogrande (5-shot) |50.99|
197
+ |GSM8k (5-shot) | 0.68|
198
+
config.json ADDED
@@ -0,0 +1,28 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "BEE-spoke-data/NanoLlama-GQA-L10-A32_KV8-v17-KI",
3
+ "architectures": [
4
+ "LlamaForCausalLM"
5
+ ],
6
+ "attention_bias": false,
7
+ "attention_dropout": 0.0,
8
+ "bos_token_id": 1,
9
+ "eos_token_id": 2,
10
+ "hidden_act": "silu",
11
+ "hidden_size": 1024,
12
+ "initializer_range": 0.02,
13
+ "intermediate_size": 4096,
14
+ "max_position_embeddings": 2048,
15
+ "model_type": "llama",
16
+ "num_attention_heads": 32,
17
+ "num_hidden_layers": 10,
18
+ "num_key_value_heads": 8,
19
+ "pretraining_tp": 1,
20
+ "rms_norm_eps": 1e-06,
21
+ "rope_scaling": null,
22
+ "rope_theta": 10000.0,
23
+ "tie_word_embeddings": false,
24
+ "torch_dtype": "bfloat16",
25
+ "transformers_version": "4.37.0.dev0",
26
+ "use_cache": false,
27
+ "vocab_size": 32128
28
+ }
generation_config.json ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ {
2
+ "_from_model_config": true,
3
+ "bos_token_id": 1,
4
+ "eos_token_id": 2,
5
+ "transformers_version": "4.37.0.dev0",
6
+ "use_cache": false
7
+ }
output.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b065fe306f9e6a504acbf5a66fcf49a3ad98b40377dd27ca8993051dc6032f78
3
+ size 159259728
special_tokens_map.json ADDED
@@ -0,0 +1,30 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": {
3
+ "content": "<s>",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "eos_token": {
10
+ "content": "</s>",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": {
17
+ "content": "</s>",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "unk_token": {
24
+ "content": "<unk>",
25
+ "lstrip": false,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ }
30
+ }
tokenizer.model ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9e556afd44213b6bd1be2b850ebbbd98f5481437a8021afaf58ee7fb1818d347
3
+ size 499723
tokenizer_config.json ADDED
@@ -0,0 +1,44 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_bos_token": true,
3
+ "add_eos_token": false,
4
+ "added_tokens_decoder": {
5
+ "0": {
6
+ "content": "<unk>",
7
+ "lstrip": false,
8
+ "normalized": false,
9
+ "rstrip": false,
10
+ "single_word": false,
11
+ "special": true
12
+ },
13
+ "1": {
14
+ "content": "<s>",
15
+ "lstrip": false,
16
+ "normalized": false,
17
+ "rstrip": false,
18
+ "single_word": false,
19
+ "special": true
20
+ },
21
+ "2": {
22
+ "content": "</s>",
23
+ "lstrip": false,
24
+ "normalized": false,
25
+ "rstrip": false,
26
+ "single_word": false,
27
+ "special": true
28
+ }
29
+ },
30
+ "bos_token": "<s>",
31
+ "clean_up_tokenization_spaces": false,
32
+ "eos_token": "</s>",
33
+ "legacy": false,
34
+ "model_max_length": 1000000000000000019884624838656,
35
+ "pad_token": "</s>",
36
+ "padding_side": "right",
37
+ "sp_model_kwargs": {},
38
+ "spaces_between_special_tokens": false,
39
+ "tokenizer_class": "LlamaTokenizer",
40
+ "trust_remote_code": false,
41
+ "unk_token": "<unk>",
42
+ "use_default_system_prompt": true,
43
+ "use_fast": true
44
+ }