karpathy commited on
Commit
cbffbc9
1 Parent(s): 302d182

Upload Stories260K model

Browse files

Adds the Stories260K model. A tiny model intended for unit tests and such.
Includes files needed for both inference in Python and for inference in run.c
See readme.md for more details

stories260K/readme.md ADDED
@@ -0,0 +1,54 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ This 260K model is a tiny model and it was trained as follows:
2
+
3
+ ```
4
+ python train.py \
5
+ --out_dir="outmini" \
6
+ --batch_size=128 \
7
+ --max_seq_len=512 \
8
+ --gradient_accumulation_steps=1 \
9
+ --vocab_source="custom" \
10
+ --vocab_size=512 \
11
+ --dim=64 \
12
+ --n_layers=5 \
13
+ --n_heads=8 \
14
+ --n_kv_heads=4 \
15
+ --multiple_of=4 \
16
+ --learning_rate=1e-3 \
17
+ --dropout=0.05 \
18
+ --weight_decay=0.01 \
19
+ --max_iters=100000 \
20
+ --beta2=0.99 \
21
+ --warmup_iters=1000 \
22
+ --eval_interval=2000 \
23
+ --eval_iters=100 \
24
+ --compile=True
25
+ ```
26
+
27
+ You'll notice that `n_kv_heads` is 4 while `n_heads` is 8, so two heads at a time share their key,value projections, i.e. this model is 2X multiquery. You'll also notice that we're using a custom tokenizer with 512 tokens. The model trained for ~10 minutes (?) on my A100 and achieves validation loss of 1.2968.
28
+
29
+ Sampling this model at temperature 0.0 (i.e. deterministic greedy argmax sampling) gives:
30
+
31
+ ```
32
+ $ ./run stories260K/stories260K.bin -z stories260K/tok512.bin -t 0.0
33
+ Once upon a time, there was a little girl named Lily. She loved to play outside in the park. One day, she saw a big, red ball. She wanted to play with it, but it was too high.
34
+ Lily's mom said, "Lily, let's go to the park." Lily was sad and didn't know what to do. She said, "I want to play with your ball, but I can't find it."
35
+ Lily was sad and didn't know what to do. She said, "I'm sorry, Lily. I didn't know what to do."
36
+ Lily didn't want to help her mom, so she said, "I'm sorry, mom. I didn't know what to do." Her mom said, "Don't worry, Lily. We can help you.
37
+ ```
38
+
39
+ You can reproduce the same in Python by running `sample.py`:
40
+
41
+ ```
42
+ $ python sample.py --checkpoint=stories260K/stories260K.pt --tokenizer=stories260K/tok512.model --temperature=0.0 --max_new_tokens=257
43
+ ```
44
+
45
+ I hardcoded max tokens to be 257 manually because the `sample.py` script doesn't currently terminate on the special BOS token like the run.c script does. Sampling at 1.0 with topp of 0.9 gives a bit more reasonable samples:
46
+
47
+ ```
48
+ $ ./run stories260K/stories260K.bin -z stories260K/tok512.bin -t 1.0 -p 0.9 -s 133742
49
+ Once upon a time, there was a little boy named Timmy. Timmy loved to play with his toys and eat sandwiches. One day, Timmy's mom told him it was time to rest for a while. Timmy's friend Billy came over and took him a down.
50
+ Timmy's mom saw that Timmy was sad, but Timmy said, "I didn't understand what is it! We need to find some leafs." Timmy thought about it and took a deep breath on a spoon. He hoped it was important to be kind and continued to find its image next time.
51
+ After they finished getting, Timmy's dad came up to his house and promised to help Timmy.
52
+ ```
53
+
54
+ Hey you can't expect too much from a 260K parameter model. I'm even mildly shocked we get this far :D
stories260K/stories260K.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b0a507e7ad0f626624f17112325e66691f9076d622e1d3274d103d00299f2696
3
+ size 1056540
stories260K/stories260K.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:eec953f9d0f139e894ef8996302680e64b24813c7a98425424f5c85f7cf4abb1
3
+ size 1061090
stories260K/tok512.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e6e45b754b603ab1fb1a31e59c1ebbee92a789504c3ddf6debb6bd3c106222d6
3
+ size 5075
stories260K/tok512.model ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:dfff07d929db979913f166ec94a6f5ecad4c70cfed8eb5c9cbe7e464455e46f5
3
+ size 7645