MihaiPopa-1 commited on
Commit
4702bd2
·
verified ·
1 Parent(s): cc2f8d7

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +70 -0
README.md ADDED
@@ -0,0 +1,70 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ datasets:
4
+ - OLMo-Coding/starcoder-python-instruct
5
+ language:
6
+ - en
7
+ pipeline_tag: text-generation
8
+ tags:
9
+ - tiny-model
10
+ - cinnabarlm
11
+ - python
12
+ - code
13
+ - tiny-llm
14
+ - tiny-lm
15
+ - tinylm
16
+ - tinyllm
17
+ ---
18
+
19
+ # CinnabarLM Python
20
+ CinnabarLM Python is a tiny, 4M-parameter code LLM trained for ~38 minutes on a T4 GPU (on Colab)! It's only 16 MB in size and now it's Llama-based!
21
+
22
+ # Why?
23
+ Because it's a good idea to make tiny LLMs. Some people already did with [MicroLM](https://huggingface.co/CromIA/MicroLM-1M), [Spark 4 5M](https://huggingface.co/LH-Tech-AI/Spark-5M-Base-v4) and [Tenete 8M](https://huggingface.co/Harley-ml/Tenete-8M), but not myself!
24
+
25
+ # Differences from Preview
26
+ * Now it's Llama-based, Preview was a custom model
27
+ * And of course, it's stable now (it doesn't generate gibberish / mess of words anymore)!
28
+
29
+ # Model Configurations
30
+ | Parameter | Value |
31
+ |---|---|
32
+ | Tokenizer | Llama 3's tokenizer (Tiktoken / BPE) |
33
+ | Vocabulary Size | 4096 tokens |
34
+ | Batch Size | 4 x 8 = 32 |
35
+ | Context Window | Maybe 2048 tokens |
36
+ | `hidden_size` | 192 |
37
+ | `intermediate_size` | 192 |
38
+ | `num_hidden_layers` | 6 |
39
+ | `num_attention_heads` | 6 |
40
+ | `max_position_embeddings` | 2048 |
41
+ | `rms_norm_eps` | `1e-5` |
42
+ | `initializer_range` | 0.02 |
43
+ | `use_cache` | True
44
+ | `tie_word_embeddings` | False
45
+ | `rope_theta` | 10000.0
46
+
47
+ # Training Configurations
48
+ | Hyperparameter | Value |
49
+ |---|---|
50
+ | `output_dir` | "./cinnabarlm-v2" |
51
+ | `max_steps` | 10000 |
52
+ | `per_device_train_batch_size` | 8 |
53
+ | `gradient_accumulation_steps` | 4 |
54
+ | `learning_rate` | 6e-4 |
55
+ | `weight_decay` | 0.01 |
56
+ | `warmup_steps` | 500 |
57
+ | `lr_scheduler_type` | "cosine" |
58
+ | `logging_steps` | 100 |
59
+ | `save_steps` | 2000 |
60
+ | `fp16` | True |
61
+ | `save_total_limit` | 2 |
62
+ | `prediction_loss_only` | True |
63
+ | `logging_first_step` | True |
64
+
65
+ # Limitations
66
+ * **Not Instruction-Tuned:** It's only a base model, so it only completes text.
67
+ * **Python-Only:** It's trained on Python code (The Stack).
68
+ # Some other details
69
+ * It's trained on ~70 million tokens of [The Stack](https://huggingface.co/datasets/OLMo-Coding/starcoder-python-instruct)
70
+ * The name "CinnabarLM" that I picked was made by combining "Cinnabar" (the new block from the Chaos Cubed drop in Minecraft) + "LM" (Language Model)