MihaiPopa-1
/

CinnabarLM-4M-Python-Base

Text Generation

Model card Files Files and versions

MihaiPopa-1 commited on 9 days ago

Commit

4702bd2

·

verified ·

1 Parent(s): cc2f8d7

Create README.md

Files changed (1) hide show

README.md +70 -0

README.md ADDED Viewed

	@@ -0,0 +1,70 @@

+---
+license: apache-2.0
+datasets:
+- OLMo-Coding/starcoder-python-instruct
+language:
+- en
+pipeline_tag: text-generation
+tags:
+- tiny-model
+- cinnabarlm
+- python
+- code
+- tiny-llm
+- tiny-lm
+- tinylm
+- tinyllm
+---
+# CinnabarLM Python
+CinnabarLM Python is a tiny, 4M-parameter code LLM trained for ~38 minutes on a T4 GPU (on Colab)! It's only 16 MB in size and now it's Llama-based!
+# Why?
+Because it's a good idea to make tiny LLMs. Some people already did with [MicroLM](https://huggingface.co/CromIA/MicroLM-1M), [Spark 4 5M](https://huggingface.co/LH-Tech-AI/Spark-5M-Base-v4) and [Tenete 8M](https://huggingface.co/Harley-ml/Tenete-8M), but not myself!
+# Differences from Preview
+* Now it's Llama-based, Preview was a custom model
+* And of course, it's stable now (it doesn't generate gibberish / mess of words anymore)!
+# Model Configurations
+| Parameter | Value |
+|---|---|
+| Tokenizer | Llama 3's tokenizer (Tiktoken / BPE) |
+| Vocabulary Size | 4096 tokens |
+| Batch Size | 4 x 8 = 32 |
+| Context Window | Maybe 2048 tokens |
+| `hidden_size` | 192 |
+| `intermediate_size` | 192 |
+| `num_hidden_layers` | 6 |
+| `num_attention_heads` | 6 |
+| `max_position_embeddings` | 2048 |
+| `rms_norm_eps` | `1e-5` |
+| `initializer_range` | 0.02 |
+| `use_cache` | True
+| `tie_word_embeddings` | False
+| `rope_theta` | 10000.0
+# Training Configurations
+| Hyperparameter | Value |
+|---|---|
+| `output_dir` | "./cinnabarlm-v2" |
+| `max_steps` | 10000 |
+| `per_device_train_batch_size` | 8 |
+| `gradient_accumulation_steps` | 4 |
+| `learning_rate` | 6e-4 |
+| `weight_decay` | 0.01 |
+| `warmup_steps` | 500 |
+| `lr_scheduler_type` | "cosine" |
+| `logging_steps` | 100 |
+| `save_steps` | 2000 |
+| `fp16` | True |
+| `save_total_limit` | 2 |
+| `prediction_loss_only` | True |
+| `logging_first_step` | True |
+# Limitations
+* **Not Instruction-Tuned:** It's only a base model, so it only completes text.
+* **Python-Only:** It's trained on Python code (The Stack).
+# Some other details
+* It's trained on ~70 million tokens of [The Stack](https://huggingface.co/datasets/OLMo-Coding/starcoder-python-instruct)
+* The name "CinnabarLM" that I picked was made by combining "Cinnabar" (the new block from the Chaos Cubed drop in Minecraft) + "LM" (Language Model)