Correct Tiny LLaMA model metadata

Browse files

Files changed (4) hide show

MODEL_CARD.md +20 -44
README.md +27 -174
config.json +2 -2
tokenizer_config.json +2 -2

MODEL_CARD.md CHANGED Viewed

@@ -5,70 +5,46 @@ license: apache-2.0
 # Tiny LLaMA
-A small LLaMA-2 inspired language model trained on TinyStories dataset.
-## Overview
-Tiny LLaMA is a 6.1M parameter language model designed for:
-- Educational purposes
-- Research on small models
-- Lightweight inference
-- Fine-tuning experiments
 ## Model Specifications
 | Property | Value |
 |----------|-------|
-| Parameters | 6.1M |
 | Layers | 6 |
-| Attention Heads | 8 |
-| Hidden Dimension | 256 |
 | Vocabulary Size | 512 |
-| Max Sequence Length | 2048 |
 | Data Type | float32 |
 ## Intended Use
-This model is intended for:
-- Text generation in the style of TinyStories
-- Research and educational purposes
-- Demonstration of language model capabilities at small scale
 ## Out-of-Scope Uses
-This model is not suitable for:
 - Production deployments
 - Knowledge-intensive tasks
-- Long-form document generation
-- Non-English content generation
-## Training Data
-Trained on TinyStories dataset consisting of 50 shards of simple English stories.
-## Tokenizer
-Uses SentencePiece tokenizer with 512 vocabulary tokens, trained on the TinyStories dataset.
-## Performance Benchmarks
-- **Load Time**: ~50ms
-- **Inference Speed (CPU)**: 50-100 tokens/sec
-- **Memory (Weights)**: 24MB
-## How to Use
 ```python
-from transformers import AutoTokenizer, AutoModelForCausalLM
-tokenizer = AutoTokenizer.from_pretrained("username/tiny-llama")
-model = AutoModelForCausalLM.from_pretrained("username/tiny-llama")
 inputs = tokenizer("Once upon a time", return_tensors="pt")
-outputs = model.generate(**inputs, max_length=100)
-print(tokenizer.decode(outputs[0]))
 ```
-## Ethical Considerations
-This model is trained on simple children's stories and is intended for educational use only.

 # Tiny LLaMA
+A 6.27M parameter LLaMA-style causal language model trained on TinyStories.
 ## Model Specifications
 | Property | Value |
 |----------|-------|
+| Parameters | 6,270,624 |
 | Layers | 6 |
+| Attention Heads | 6 |
+| Key/Value Heads | 6 |
+| Head Dimension | 48 |
+| Hidden Size | 288 |
+| Intermediate Size | 768 |
 | Vocabulary Size | 512 |
+| Training Sequence Length | 256 |
 | Data Type | float32 |
 ## Intended Use
+- TinyStories-style text generation
+- Educational examples
+- Small-model research
+- ASHA backend inference testing
 ## Out-of-Scope Uses
 - Production deployments
 - Knowledge-intensive tasks
+- Long-form generation
+- Multilingual generation
+## Usage
 ```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+tokenizer = AutoTokenizer.from_pretrained("manojredhat/tiny-llama")
+model = AutoModelForCausalLM.from_pretrained("manojredhat/tiny-llama")
 inputs = tokenizer("Once upon a time", return_tensors="pt")
+outputs = model.generate(**inputs, max_new_tokens=40, do_sample=False)
+print(tokenizer.decode(outputs[0], skip_special_tokens=True))
 ```

README.md CHANGED Viewed

@@ -18,203 +18,56 @@ model-index:
 # Tiny LLaMA - TinyStories Edition
-A lightweight LLaMA-2 inspired model trained on the TinyStories dataset. This model is designed for educational purposes and lightweight inference.
 ## Model Details
-- **Model Type**: Decoder-only Transformer (LLaMA architecture)
-- **Parameters**: 6.1M
 - **Layers**: 6
-- **Attention Heads**: 8
-- **Embedding Dimension**: 256
-- **Vocabulary Size**: 512 (SentencePiece)
-- **Max Sequence Length**: 2048
 - **Data Type**: float32
 - **Format**: safetensors
 ## Training
-- **Dataset**: TinyStories (roneneldan/TinyStories)
-- **Data Shards**: 50
 - **Training Iterations**: 100
 - **Initial Loss**: 6.27
 - **Final Loss**: 4.81
-- **Validation Loss**: 6.29 → 4.77
-## Quick Start
-### Installation
-```bash
-pip install transformers safetensors torch
-```
-### Basic Usage
 ```python
-from transformers import AutoTokenizer, AutoModelForCausalLM
-import torch
-# Load model and tokenizer
 tokenizer = AutoTokenizer.from_pretrained("manojredhat/tiny-llama")
 model = AutoModelForCausalLM.from_pretrained("manojredhat/tiny-llama")
-# Generate text
-prompt = "Once upon a time"
-input_ids = tokenizer(prompt, return_tensors="pt").input_ids
-with torch.no_grad():
-    output = model.generate(input_ids, max_length=100, temperature=0.8, top_p=0.95)
-generated_text = tokenizer.decode(output[0])
-print(generated_text)
 ```
-### Advanced Generation
-```python
-# With more control
-output = model.generate(
-    input_ids,
-    max_length=150,
-    temperature=0.7,
-    top_p=0.9,
-    num_beams=1,
-    do_sample=True,
-    pad_token_id=tokenizer.eos_token_id,
-)
-# Batch generation
-batch_prompts = [
-    "Once upon a time",
-    "The girl went to",
-    "In a small village"
-]
-inputs = tokenizer(batch_prompts, return_tensors="pt", padding=True)
-outputs = model.generate(**inputs, max_length=100)
-texts = tokenizer.batch_decode(outputs)
-```
-## Model Architecture
-### Layer Structure
-1. Embedding Layer (512 tokens → 256 dims)
-2. 6 Transformer Blocks:
-   - Multi-Head Self-Attention (8 heads)
-   - RMS Normalization
-   - Feed-Forward Network (4x hidden size)
-   - Residual Connections
-3. Output Projection (256 dims → 512 tokens)
-### Attention Details
-- **Type**: Multi-Head Self-Attention
-- **Heads**: 8
-- **Head Dimension**: 32
-- **Rotary Embeddings (RoPE)**: Yes
-- **Query-Key Normalization**: RMS Norm
-### Activation Function
-- **Feed-Forward**: SiLU (Swish)
-- **Normalization**: RMS Norm (ε=1e-5)
 ## Tokenizer
-- **Type**: SentencePiece
-- **Vocabulary Size**: 512 tokens
-- **Special Tokens**:
-  - `<s>` (BOS): Token ID 1
-  - `</s>` (EOS): Token ID 2
-  - `<unk>` (UNK): Token ID 0
-## Performance
-Typical inference speed on different hardware:
-- **CPU**: ~50-100 tokens/sec
-- **GPU (RTX 3090)**: ~500-1000 tokens/sec
-- **GPU (A100)**: ~2000+ tokens/sec
-Memory requirements:
-- **Model weights**: ~24MB (fp32)
-- **Inference memory**: ~200-300MB
-## Training Details
-### Dataset
-- Source: TinyStories (Roneneldan et al.)
-- Stories about simple, everyday events
-- ~50 shards, ~1.5GB total
-- Pre-tokenized to uint16 arrays
-### Optimization
-- **Optimizer**: AdamW
-- **Learning Rate**: 1e-3 (with cosine annealing)
-- **Batch Size**: 64
-- **Gradient Accumulation**: 8 steps
-- **Warmup**: 100 iterations
-### Convergence
-```
-Iteration    Train Loss    Val Loss
-0            6.27          6.29
-50           5.24          5.31
-100          4.81          4.77
-```
-## Limitations
-1. **Knowledge Cutoff**: Trained only on TinyStories dataset
-2. **Output Quality**: Designed for short stories, may struggle with other domains
-3. **Vocabulary**: 512-token vocabulary is limited (compared to full LLaMA's 32k)
-4. **Sequence Length**: Max 2048 tokens
-5. **Fine-tuning**: Intended for inference, may require retraining for other tasks
-## Use Cases
-✓ Educational purposes
-✓ Lightweight story generation
-✓ Research on small language models
-✓ Inference on CPU/edge devices
-✓ Fine-tuning on smaller datasets
-✗ Production deployments
-✗ Knowledge-intensive tasks
-✗ Long-form content generation
-✗ Multilingual tasks
-## Files in This Repository
-- `model.safetensors` - Model weights in safetensors format (fp32)
-- `config.json` - Model configuration
-- `tokenizer.model` - SentencePiece tokenizer vocabulary
-- `tokenizer_config.json` - Tokenizer configuration
-- `README.md` - This file
-## Citation
-If you use this model in your research, please cite:
-```bibtex
-@article{tinystories,
-  title={TinyStories: How Small Can Language Models Be and Still Speak Coherent English?},
-  author={Eldan, Ronen and Li, Yonatan},
-  journal={arXiv preprint arXiv:2305.07759},
-  year={2023}
-}
-@article{llama2,
-  title={Llama 2: Open Foundation and Fine-Tuned Chat Models},
-  author={Touvron, Hugo and others},
-  journal={arXiv preprint arXiv:2307.09288},
-  year={2023}
-}
-```
-## License
-This model is provided as-is for educational and research purposes.
-## Contact & Feedback
-Created with PyTorch and transformers library.
-For questions or issues, please open an issue on the model repository.

 # Tiny LLaMA - TinyStories Edition
+A small LLaMA-style causal language model trained on the TinyStories dataset.
+This repository contains the Hugging Face `LlamaForCausalLM` conversion of the
+local checkpoint from `/home/manojk/small_llama/llama2.c/out/ckpt.pt`.
 ## Model Details
+- **Model Type**: Decoder-only Transformer (`LlamaForCausalLM`)
+- **Parameters**: 6,270,624
 - **Layers**: 6
+- **Attention Heads**: 6
+- **Key/Value Heads**: 6
+- **Head Dimension**: 48
+- **Hidden Size**: 288
+- **Intermediate Size**: 768
+- **Vocabulary Size**: 512
+- **Training Sequence Length**: 256
 - **Data Type**: float32
 - **Format**: safetensors
 ## Training
+- **Dataset**: TinyStories
 - **Training Iterations**: 100
 - **Initial Loss**: 6.27
 - **Final Loss**: 4.81
+- **Validation Loss**: 6.29 to 4.77
+## Usage
 ```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
 tokenizer = AutoTokenizer.from_pretrained("manojredhat/tiny-llama")
 model = AutoModelForCausalLM.from_pretrained("manojredhat/tiny-llama")
+inputs = tokenizer("Once upon a time", return_tensors="pt")
+outputs = model.generate(**inputs, max_new_tokens=40, do_sample=False)
+print(tokenizer.decode(outputs[0], skip_special_tokens=True))
 ```
 ## Tokenizer
+The model uses a SentencePiece tokenizer with 512 tokens:
+- `<unk>`: token ID 0
+- `<s>`: token ID 1
+- `</s>`: token ID 2
+## Notes
+This is an educational small model trained for short TinyStories-style text.
+It is not intended for production use, knowledge-intensive tasks, or long-form
+generation.

config.json CHANGED Viewed

@@ -9,7 +9,7 @@
   "hidden_size": 288,
   "initializer_range": 0.02,
   "intermediate_size": 768,
-  "max_position_embeddings": 2048,
   "model_type": "llama",
   "num_attention_heads": 6,
   "num_hidden_layers": 6,
@@ -24,4 +24,4 @@
   "transformers_version": "4.36.0",
   "use_cache": true,
   "vocab_size": 512
-}

   "hidden_size": 288,
   "initializer_range": 0.02,
   "intermediate_size": 768,
+  "max_position_embeddings": 256,
   "model_type": "llama",
   "num_attention_heads": 6,
   "num_hidden_layers": 6,
   "transformers_version": "4.36.0",
   "use_cache": true,
   "vocab_size": 512
+}

tokenizer_config.json CHANGED Viewed

@@ -3,7 +3,7 @@
   "add_eos_token": false,
   "add_prefix_space": false,
   "legacy": false,
-  "model_max_length": 2048,
   "tokenizer_class": "LlamaTokenizer",
   "pad_token": "<unk>",
   "bos_token": {
@@ -30,4 +30,4 @@
     "rstrip": false,
     "single_word": false
   }
-}

   "add_eos_token": false,
   "add_prefix_space": false,
   "legacy": false,
+  "model_max_length": 256,
   "tokenizer_class": "LlamaTokenizer",
   "pad_token": "<unk>",
   "bos_token": {
     "rstrip": false,
     "single_word": false
   }
+}