Create README.md
Browse files
README.md
ADDED
|
@@ -0,0 +1,70 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: apache-2.0
|
| 3 |
+
datasets:
|
| 4 |
+
- OLMo-Coding/starcoder-python-instruct
|
| 5 |
+
language:
|
| 6 |
+
- en
|
| 7 |
+
pipeline_tag: text-generation
|
| 8 |
+
tags:
|
| 9 |
+
- tiny-model
|
| 10 |
+
- cinnabarlm
|
| 11 |
+
- python
|
| 12 |
+
- code
|
| 13 |
+
- tiny-llm
|
| 14 |
+
- tiny-lm
|
| 15 |
+
- tinylm
|
| 16 |
+
- tinyllm
|
| 17 |
+
---
|
| 18 |
+
|
| 19 |
+
# CinnabarLM Python
|
| 20 |
+
CinnabarLM Python is a tiny, 4M-parameter code LLM trained for ~38 minutes on a T4 GPU (on Colab)! It's only 16 MB in size and now it's Llama-based!
|
| 21 |
+
|
| 22 |
+
# Why?
|
| 23 |
+
Because it's a good idea to make tiny LLMs. Some people already did with [MicroLM](https://huggingface.co/CromIA/MicroLM-1M), [Spark 4 5M](https://huggingface.co/LH-Tech-AI/Spark-5M-Base-v4) and [Tenete 8M](https://huggingface.co/Harley-ml/Tenete-8M), but not myself!
|
| 24 |
+
|
| 25 |
+
# Differences from Preview
|
| 26 |
+
* Now it's Llama-based, Preview was a custom model
|
| 27 |
+
* And of course, it's stable now (it doesn't generate gibberish / mess of words anymore)!
|
| 28 |
+
|
| 29 |
+
# Model Configurations
|
| 30 |
+
| Parameter | Value |
|
| 31 |
+
|---|---|
|
| 32 |
+
| Tokenizer | Llama 3's tokenizer (Tiktoken / BPE) |
|
| 33 |
+
| Vocabulary Size | 4096 tokens |
|
| 34 |
+
| Batch Size | 4 x 8 = 32 |
|
| 35 |
+
| Context Window | Maybe 2048 tokens |
|
| 36 |
+
| `hidden_size` | 192 |
|
| 37 |
+
| `intermediate_size` | 192 |
|
| 38 |
+
| `num_hidden_layers` | 6 |
|
| 39 |
+
| `num_attention_heads` | 6 |
|
| 40 |
+
| `max_position_embeddings` | 2048 |
|
| 41 |
+
| `rms_norm_eps` | `1e-5` |
|
| 42 |
+
| `initializer_range` | 0.02 |
|
| 43 |
+
| `use_cache` | True
|
| 44 |
+
| `tie_word_embeddings` | False
|
| 45 |
+
| `rope_theta` | 10000.0
|
| 46 |
+
|
| 47 |
+
# Training Configurations
|
| 48 |
+
| Hyperparameter | Value |
|
| 49 |
+
|---|---|
|
| 50 |
+
| `output_dir` | "./cinnabarlm-v2" |
|
| 51 |
+
| `max_steps` | 10000 |
|
| 52 |
+
| `per_device_train_batch_size` | 8 |
|
| 53 |
+
| `gradient_accumulation_steps` | 4 |
|
| 54 |
+
| `learning_rate` | 6e-4 |
|
| 55 |
+
| `weight_decay` | 0.01 |
|
| 56 |
+
| `warmup_steps` | 500 |
|
| 57 |
+
| `lr_scheduler_type` | "cosine" |
|
| 58 |
+
| `logging_steps` | 100 |
|
| 59 |
+
| `save_steps` | 2000 |
|
| 60 |
+
| `fp16` | True |
|
| 61 |
+
| `save_total_limit` | 2 |
|
| 62 |
+
| `prediction_loss_only` | True |
|
| 63 |
+
| `logging_first_step` | True |
|
| 64 |
+
|
| 65 |
+
# Limitations
|
| 66 |
+
* **Not Instruction-Tuned:** It's only a base model, so it only completes text.
|
| 67 |
+
* **Python-Only:** It's trained on Python code (The Stack).
|
| 68 |
+
# Some other details
|
| 69 |
+
* It's trained on ~70 million tokens of [The Stack](https://huggingface.co/datasets/OLMo-Coding/starcoder-python-instruct)
|
| 70 |
+
* The name "CinnabarLM" that I picked was made by combining "Cinnabar" (the new block from the Chaos Cubed drop in Minecraft) + "LM" (Language Model)
|