manojredhat commited on
Commit
756e974
·
verified ·
1 Parent(s): 8f4b37b

Correct Tiny LLaMA model metadata

Browse files
Files changed (4) hide show
  1. MODEL_CARD.md +20 -44
  2. README.md +27 -174
  3. config.json +2 -2
  4. tokenizer_config.json +2 -2
MODEL_CARD.md CHANGED
@@ -5,70 +5,46 @@ license: apache-2.0
5
 
6
  # Tiny LLaMA
7
 
8
- A small LLaMA-2 inspired language model trained on TinyStories dataset.
9
-
10
- ## Overview
11
-
12
- Tiny LLaMA is a 6.1M parameter language model designed for:
13
- - Educational purposes
14
- - Research on small models
15
- - Lightweight inference
16
- - Fine-tuning experiments
17
 
18
  ## Model Specifications
19
 
20
  | Property | Value |
21
  |----------|-------|
22
- | Parameters | 6.1M |
23
  | Layers | 6 |
24
- | Attention Heads | 8 |
25
- | Hidden Dimension | 256 |
 
 
 
26
  | Vocabulary Size | 512 |
27
- | Max Sequence Length | 2048 |
28
  | Data Type | float32 |
29
 
30
  ## Intended Use
31
 
32
- This model is intended for:
33
- - Text generation in the style of TinyStories
34
- - Research and educational purposes
35
- - Demonstration of language model capabilities at small scale
36
 
37
  ## Out-of-Scope Uses
38
 
39
- This model is not suitable for:
40
  - Production deployments
41
  - Knowledge-intensive tasks
42
- - Long-form document generation
43
- - Non-English content generation
44
-
45
- ## Training Data
46
-
47
- Trained on TinyStories dataset consisting of 50 shards of simple English stories.
48
-
49
- ## Tokenizer
50
 
51
- Uses SentencePiece tokenizer with 512 vocabulary tokens, trained on the TinyStories dataset.
52
-
53
- ## Performance Benchmarks
54
-
55
- - **Load Time**: ~50ms
56
- - **Inference Speed (CPU)**: 50-100 tokens/sec
57
- - **Memory (Weights)**: 24MB
58
-
59
- ## How to Use
60
 
61
  ```python
62
- from transformers import AutoTokenizer, AutoModelForCausalLM
63
 
64
- tokenizer = AutoTokenizer.from_pretrained("username/tiny-llama")
65
- model = AutoModelForCausalLM.from_pretrained("username/tiny-llama")
66
 
67
  inputs = tokenizer("Once upon a time", return_tensors="pt")
68
- outputs = model.generate(**inputs, max_length=100)
69
- print(tokenizer.decode(outputs[0]))
70
  ```
71
-
72
- ## Ethical Considerations
73
-
74
- This model is trained on simple children's stories and is intended for educational use only.
 
5
 
6
  # Tiny LLaMA
7
 
8
+ A 6.27M parameter LLaMA-style causal language model trained on TinyStories.
 
 
 
 
 
 
 
 
9
 
10
  ## Model Specifications
11
 
12
  | Property | Value |
13
  |----------|-------|
14
+ | Parameters | 6,270,624 |
15
  | Layers | 6 |
16
+ | Attention Heads | 6 |
17
+ | Key/Value Heads | 6 |
18
+ | Head Dimension | 48 |
19
+ | Hidden Size | 288 |
20
+ | Intermediate Size | 768 |
21
  | Vocabulary Size | 512 |
22
+ | Training Sequence Length | 256 |
23
  | Data Type | float32 |
24
 
25
  ## Intended Use
26
 
27
+ - TinyStories-style text generation
28
+ - Educational examples
29
+ - Small-model research
30
+ - ASHA backend inference testing
31
 
32
  ## Out-of-Scope Uses
33
 
 
34
  - Production deployments
35
  - Knowledge-intensive tasks
36
+ - Long-form generation
37
+ - Multilingual generation
 
 
 
 
 
 
38
 
39
+ ## Usage
 
 
 
 
 
 
 
 
40
 
41
  ```python
42
+ from transformers import AutoModelForCausalLM, AutoTokenizer
43
 
44
+ tokenizer = AutoTokenizer.from_pretrained("manojredhat/tiny-llama")
45
+ model = AutoModelForCausalLM.from_pretrained("manojredhat/tiny-llama")
46
 
47
  inputs = tokenizer("Once upon a time", return_tensors="pt")
48
+ outputs = model.generate(**inputs, max_new_tokens=40, do_sample=False)
49
+ print(tokenizer.decode(outputs[0], skip_special_tokens=True))
50
  ```
 
 
 
 
README.md CHANGED
@@ -18,203 +18,56 @@ model-index:
18
 
19
  # Tiny LLaMA - TinyStories Edition
20
 
21
- A lightweight LLaMA-2 inspired model trained on the TinyStories dataset. This model is designed for educational purposes and lightweight inference.
 
 
22
 
23
  ## Model Details
24
 
25
- - **Model Type**: Decoder-only Transformer (LLaMA architecture)
26
- - **Parameters**: 6.1M
27
  - **Layers**: 6
28
- - **Attention Heads**: 8
29
- - **Embedding Dimension**: 256
30
- - **Vocabulary Size**: 512 (SentencePiece)
31
- - **Max Sequence Length**: 2048
 
 
 
32
  - **Data Type**: float32
33
  - **Format**: safetensors
34
 
35
  ## Training
36
 
37
- - **Dataset**: TinyStories (roneneldan/TinyStories)
38
- - **Data Shards**: 50
39
  - **Training Iterations**: 100
40
  - **Initial Loss**: 6.27
41
  - **Final Loss**: 4.81
42
- - **Validation Loss**: 6.29 4.77
43
 
44
- ## Quick Start
45
-
46
- ### Installation
47
-
48
- ```bash
49
- pip install transformers safetensors torch
50
- ```
51
-
52
- ### Basic Usage
53
 
54
  ```python
55
- from transformers import AutoTokenizer, AutoModelForCausalLM
56
- import torch
57
 
58
- # Load model and tokenizer
59
  tokenizer = AutoTokenizer.from_pretrained("manojredhat/tiny-llama")
60
  model = AutoModelForCausalLM.from_pretrained("manojredhat/tiny-llama")
61
 
62
- # Generate text
63
- prompt = "Once upon a time"
64
- input_ids = tokenizer(prompt, return_tensors="pt").input_ids
65
-
66
- with torch.no_grad():
67
- output = model.generate(input_ids, max_length=100, temperature=0.8, top_p=0.95)
68
-
69
- generated_text = tokenizer.decode(output[0])
70
- print(generated_text)
71
  ```
72
 
73
- ### Advanced Generation
74
-
75
- ```python
76
- # With more control
77
- output = model.generate(
78
- input_ids,
79
- max_length=150,
80
- temperature=0.7,
81
- top_p=0.9,
82
- num_beams=1,
83
- do_sample=True,
84
- pad_token_id=tokenizer.eos_token_id,
85
- )
86
-
87
- # Batch generation
88
- batch_prompts = [
89
- "Once upon a time",
90
- "The girl went to",
91
- "In a small village"
92
- ]
93
- inputs = tokenizer(batch_prompts, return_tensors="pt", padding=True)
94
- outputs = model.generate(**inputs, max_length=100)
95
- texts = tokenizer.batch_decode(outputs)
96
- ```
97
-
98
- ## Model Architecture
99
-
100
- ### Layer Structure
101
- 1. Embedding Layer (512 tokens → 256 dims)
102
- 2. 6 Transformer Blocks:
103
- - Multi-Head Self-Attention (8 heads)
104
- - RMS Normalization
105
- - Feed-Forward Network (4x hidden size)
106
- - Residual Connections
107
- 3. Output Projection (256 dims → 512 tokens)
108
-
109
- ### Attention Details
110
- - **Type**: Multi-Head Self-Attention
111
- - **Heads**: 8
112
- - **Head Dimension**: 32
113
- - **Rotary Embeddings (RoPE)**: Yes
114
- - **Query-Key Normalization**: RMS Norm
115
-
116
- ### Activation Function
117
- - **Feed-Forward**: SiLU (Swish)
118
- - **Normalization**: RMS Norm (ε=1e-5)
119
-
120
  ## Tokenizer
121
 
122
- - **Type**: SentencePiece
123
- - **Vocabulary Size**: 512 tokens
124
- - **Special Tokens**:
125
- - `<s>` (BOS): Token ID 1
126
- - `</s>` (EOS): Token ID 2
127
- - `<unk>` (UNK): Token ID 0
128
-
129
- ## Performance
130
-
131
- Typical inference speed on different hardware:
132
- - **CPU**: ~50-100 tokens/sec
133
- - **GPU (RTX 3090)**: ~500-1000 tokens/sec
134
- - **GPU (A100)**: ~2000+ tokens/sec
135
-
136
- Memory requirements:
137
- - **Model weights**: ~24MB (fp32)
138
- - **Inference memory**: ~200-300MB
139
-
140
- ## Training Details
141
-
142
- ### Dataset
143
- - Source: TinyStories (Roneneldan et al.)
144
- - Stories about simple, everyday events
145
- - ~50 shards, ~1.5GB total
146
- - Pre-tokenized to uint16 arrays
147
-
148
- ### Optimization
149
- - **Optimizer**: AdamW
150
- - **Learning Rate**: 1e-3 (with cosine annealing)
151
- - **Batch Size**: 64
152
- - **Gradient Accumulation**: 8 steps
153
- - **Warmup**: 100 iterations
154
-
155
- ### Convergence
156
- ```
157
- Iteration Train Loss Val Loss
158
- 0 6.27 6.29
159
- 50 5.24 5.31
160
- 100 4.81 4.77
161
- ```
162
-
163
- ## Limitations
164
-
165
- 1. **Knowledge Cutoff**: Trained only on TinyStories dataset
166
- 2. **Output Quality**: Designed for short stories, may struggle with other domains
167
- 3. **Vocabulary**: 512-token vocabulary is limited (compared to full LLaMA's 32k)
168
- 4. **Sequence Length**: Max 2048 tokens
169
- 5. **Fine-tuning**: Intended for inference, may require retraining for other tasks
170
-
171
- ## Use Cases
172
-
173
- ✓ Educational purposes
174
- ✓ Lightweight story generation
175
- ✓ Research on small language models
176
- ✓ Inference on CPU/edge devices
177
- ✓ Fine-tuning on smaller datasets
178
-
179
- ✗ Production deployments
180
- ✗ Knowledge-intensive tasks
181
- ✗ Long-form content generation
182
- ✗ Multilingual tasks
183
-
184
- ## Files in This Repository
185
-
186
- - `model.safetensors` - Model weights in safetensors format (fp32)
187
- - `config.json` - Model configuration
188
- - `tokenizer.model` - SentencePiece tokenizer vocabulary
189
- - `tokenizer_config.json` - Tokenizer configuration
190
- - `README.md` - This file
191
-
192
- ## Citation
193
-
194
- If you use this model in your research, please cite:
195
-
196
- ```bibtex
197
- @article{tinystories,
198
- title={TinyStories: How Small Can Language Models Be and Still Speak Coherent English?},
199
- author={Eldan, Ronen and Li, Yonatan},
200
- journal={arXiv preprint arXiv:2305.07759},
201
- year={2023}
202
- }
203
-
204
- @article{llama2,
205
- title={Llama 2: Open Foundation and Fine-Tuned Chat Models},
206
- author={Touvron, Hugo and others},
207
- journal={arXiv preprint arXiv:2307.09288},
208
- year={2023}
209
- }
210
- ```
211
-
212
- ## License
213
-
214
- This model is provided as-is for educational and research purposes.
215
 
216
- ## Contact & Feedback
 
 
217
 
218
- Created with PyTorch and transformers library.
219
- For questions or issues, please open an issue on the model repository.
220
 
 
 
 
 
18
 
19
  # Tiny LLaMA - TinyStories Edition
20
 
21
+ A small LLaMA-style causal language model trained on the TinyStories dataset.
22
+ This repository contains the Hugging Face `LlamaForCausalLM` conversion of the
23
+ local checkpoint from `/home/manojk/small_llama/llama2.c/out/ckpt.pt`.
24
 
25
  ## Model Details
26
 
27
+ - **Model Type**: Decoder-only Transformer (`LlamaForCausalLM`)
28
+ - **Parameters**: 6,270,624
29
  - **Layers**: 6
30
+ - **Attention Heads**: 6
31
+ - **Key/Value Heads**: 6
32
+ - **Head Dimension**: 48
33
+ - **Hidden Size**: 288
34
+ - **Intermediate Size**: 768
35
+ - **Vocabulary Size**: 512
36
+ - **Training Sequence Length**: 256
37
  - **Data Type**: float32
38
  - **Format**: safetensors
39
 
40
  ## Training
41
 
42
+ - **Dataset**: TinyStories
 
43
  - **Training Iterations**: 100
44
  - **Initial Loss**: 6.27
45
  - **Final Loss**: 4.81
46
+ - **Validation Loss**: 6.29 to 4.77
47
 
48
+ ## Usage
 
 
 
 
 
 
 
 
49
 
50
  ```python
51
+ from transformers import AutoModelForCausalLM, AutoTokenizer
 
52
 
 
53
  tokenizer = AutoTokenizer.from_pretrained("manojredhat/tiny-llama")
54
  model = AutoModelForCausalLM.from_pretrained("manojredhat/tiny-llama")
55
 
56
+ inputs = tokenizer("Once upon a time", return_tensors="pt")
57
+ outputs = model.generate(**inputs, max_new_tokens=40, do_sample=False)
58
+ print(tokenizer.decode(outputs[0], skip_special_tokens=True))
 
 
 
 
 
 
59
  ```
60
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
61
  ## Tokenizer
62
 
63
+ The model uses a SentencePiece tokenizer with 512 tokens:
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
64
 
65
+ - `<unk>`: token ID 0
66
+ - `<s>`: token ID 1
67
+ - `</s>`: token ID 2
68
 
69
+ ## Notes
 
70
 
71
+ This is an educational small model trained for short TinyStories-style text.
72
+ It is not intended for production use, knowledge-intensive tasks, or long-form
73
+ generation.
config.json CHANGED
@@ -9,7 +9,7 @@
9
  "hidden_size": 288,
10
  "initializer_range": 0.02,
11
  "intermediate_size": 768,
12
- "max_position_embeddings": 2048,
13
  "model_type": "llama",
14
  "num_attention_heads": 6,
15
  "num_hidden_layers": 6,
@@ -24,4 +24,4 @@
24
  "transformers_version": "4.36.0",
25
  "use_cache": true,
26
  "vocab_size": 512
27
- }
 
9
  "hidden_size": 288,
10
  "initializer_range": 0.02,
11
  "intermediate_size": 768,
12
+ "max_position_embeddings": 256,
13
  "model_type": "llama",
14
  "num_attention_heads": 6,
15
  "num_hidden_layers": 6,
 
24
  "transformers_version": "4.36.0",
25
  "use_cache": true,
26
  "vocab_size": 512
27
+ }
tokenizer_config.json CHANGED
@@ -3,7 +3,7 @@
3
  "add_eos_token": false,
4
  "add_prefix_space": false,
5
  "legacy": false,
6
- "model_max_length": 2048,
7
  "tokenizer_class": "LlamaTokenizer",
8
  "pad_token": "<unk>",
9
  "bos_token": {
@@ -30,4 +30,4 @@
30
  "rstrip": false,
31
  "single_word": false
32
  }
33
- }
 
3
  "add_eos_token": false,
4
  "add_prefix_space": false,
5
  "legacy": false,
6
+ "model_max_length": 256,
7
  "tokenizer_class": "LlamaTokenizer",
8
  "pad_token": "<unk>",
9
  "bos_token": {
 
30
  "rstrip": false,
31
  "single_word": false
32
  }
33
+ }