CompactAI commited on
Commit
1d2ca43
·
verified ·
1 Parent(s): fa0878c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +161 -3
README.md CHANGED
@@ -1,3 +1,161 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ datasets:
4
+ - HuggingFaceFW/fineweb-edu
5
+ - mattwesney/General_Inquiry_Thinking-Chain-Of-Thought
6
+ - tatsu-lab/alpaca
7
+ - databricks/databricks-dolly-15k
8
+ - TeichAI/Step-3.5-Flash-2600x
9
+ - TeichAI/convo-v1
10
+ language:
11
+ - en
12
+ tags:
13
+ - small
14
+ - haiku
15
+ ---
16
+ # TMLM-Haiku-2
17
+
18
+ > A 1M-parameter language model that speaks English, technically.
19
+
20
+ WARNING: This model was trained on a shoestring budget and a prayer.
21
+ It does not answer questions correctly. It does not follow instructions well.
22
+ It does, however, occasionally produce output that sounds profound until you read it twice.
23
+
24
+ ---
25
+
26
+ ## Overview
27
+
28
+ TMLM-Haiku-2 is a tiny autoregressive language model with approximately one million parameters. That is not a typo. In an era where models are measured in billions, we went the other direction, mostly because we could.
29
+
30
+ It was trained on English text at a ratio of 100 tokens per parameter. The math is simple: 1M parameters × 100 tokens = ~100M total tokens. We split this budget deliberately: roughly two-thirds (~66.7M tokens) went into general pretraining, after which the resulting checkpoint was fine-tuned on instruction data using the remaining one-third (~33.3M tokens).
31
+
32
+ This approach lets us squeeze more signal out of every parameter. It does not make the model smart. It just makes it slightly less confused than it would have been otherwise.
33
+
34
+ Curiously, the pretrained checkpoint tends to produce more fluent output than the instruction-tuned version. It turns out that teaching a model to follow directions might actually make it worse at saying things. Who knew.
35
+
36
+ ---
37
+
38
+ ## Efficiency Claim (The One We Are Allowed to Make)
39
+
40
+ At 100 tokens per parameter, TMLM-Haiku-2 is **one of the most efficient models out there** at this scale. If you were to compare it against another 1M parameter model trained on fewer tokens, these weights would likely outperform them.
41
+
42
+ We are not claiming state of the art. We are claiming that, for a model this small, we got more out of the data than most would. That is a modest victory, but we will take it.
43
+
44
+ ---
45
+
46
+ ## Sample Outputs
47
+
48
+ ### Instruction-Tuned
49
+
50
+ User: Is food safe to eat?
51
+ Model: As an Age, in relation (a rush-rict), a decisios is the **exactly* estimate**
52
+ (each labeling at * *Crise*).
53
+
54
+ When you cannot find all faces (doinins one way tea t become flawed) are no
55
+ longer rartints like awarded, n but wheret dogs control crashed ttega y
56
+
57
+ ### Pretrained
58
+
59
+ User: A cat, commonly known as the
60
+ Model: primary current burden of transportation wasnt impossible. A recence includes
61
+ a personal crisis for participated populatably, to make sure technology
62
+ systems change from an illustrious phase orbit oncold, rock areas, along
63
+ with localizators outside again
64
+
65
+ We recommend treating these outputs as experimental prose. If you find meaning in them, that is probably more about you than the model.
66
+
67
+ ---
68
+
69
+ ## Technical Specifications
70
+
71
+ | Attribute | Value |
72
+ |-----------|-------|
73
+ | Parameters | ~1,000,000 |
74
+ | Language | English |
75
+ | Tokenization | Word-level |
76
+ | Architecture | Lightweight Transformer |
77
+ | Total Tokens | ~100M (100 tokens/param) |
78
+ | Pretraining Tokens | ~66.7M (2/3 of budget) |
79
+ | Instruction Tokens | ~33.3M (1/3 of budget) |
80
+ | Target Throughput | ~1M tokens/sec |
81
+ | License | MIT |
82
+ | Repository | https://huggingface.co/CompactAI/TMLM-Haiku-2 |
83
+
84
+ ---
85
+
86
+ ## Getting Started
87
+
88
+ from transformers import AutoModelForCausalLM, AutoTokenizer
89
+
90
+ model_name = "CompactAI/TMLM-Haiku-2"
91
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
92
+ model = AutoModelForCausalLM.from_pretrained(model_name)
93
+
94
+ prompt = "A cat, commonly known as the"
95
+ inputs = tokenizer(prompt, return_tensors="pt")
96
+ outputs = model.generate(**inputs, max_new_tokens=50, do_sample=True, temperature=0.9)
97
+
98
+ print(tokenizer.decode(outputs[0], skip_special_tokens=True))
99
+
100
+ Pro tip: Adjust temperature between 0.8 and 1.2 for optimal levels of confusion.
101
+
102
+ ---
103
+
104
+ ## Reasonable Use Cases
105
+
106
+ - Generating creative writing prompts that nobody asked for
107
+ - Studying how small models fail in charming ways
108
+ - Populating game worlds with NPCs that speak in riddles
109
+ - Teaching students why bigger is not always better (Pun intended)
110
+ - Amusing yourself during long training runs
111
+
112
+ ---
113
+
114
+ ## Unreasonable Use Cases
115
+
116
+ - Anything requiring factual accuracy
117
+ - Customer support automation
118
+ - Medical, legal, or financial advice (oh hell no)
119
+ - Replacing a search engine
120
+ - Expecting the model to know what it is talking about
121
+
122
+ ---
123
+
124
+ ## Philosophy
125
+
126
+ TMLM-Haiku-2 exists because we wondered what would happen if we trained a very small model on a very large dataset and then asked it to talk. The answer, as you have seen, is complicated.
127
+
128
+ The training strategy was simple: allocate two-thirds of the token budget to broad pretraining, then use the remainder to nudge the model toward instruction following. This does not produce a capable assistant. It does produce a model that learned as much as it could, given the constraints.
129
+
130
+ This project is part of CompactAI, an ongoing effort to explore language modeling at the edge of feasibility. We believe that interesting things can happen when you remove the safety net of scale. Sometimes those things are useful. Sometimes they are just funny.
131
+
132
+ ---
133
+
134
+ ## Contributing
135
+
136
+ We welcome:
137
+ - Bug reports, especially those accompanied by entertaining failure cases
138
+ - Prompts that coax unexpectedly poetic output from the model
139
+ - Research collaborations focused on ultra-small model dynamics
140
+ - Feedback on how to make a 1M parameter model slightly less confused
141
+ Note:
142
+ If its a bug we will include fixes in later stages of TMLM-Haiku & other varients if present
143
+ Please do not submit pull requests that add more parameters. That defeats the purpose. Please.
144
+
145
+ ---
146
+
147
+ ## Citation
148
+
149
+ @misc{tmlm-haiku-2-2026,
150
+ title={TMLM-Haiku-2: A 1M-Parameter English Language Model for Experimental Use},
151
+ author={CompactAI},
152
+ year={2026},
153
+ howpublished={\url{https://huggingface.co/CompactAI/TMLM-Haiku-2}},
154
+ note={Trained with hope. Deploy with caution.}
155
+ }
156
+
157
+ ---
158
+
159
+ > The model generates text. Whether that text means anything is a question for philosophers.
160
+
161
+ Train small. Expect less. Laugh anyway.