N8Programs commited on
Commit
ac593f9
·
verified ·
1 Parent(s): 4029793

Upload 7 files

Browse files
.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ radar_chart.png filter=lfs diff=lfs merge=lfs -text
README.md CHANGED
@@ -1,3 +1,115 @@
1
- ---
2
- license: cc-by-sa-4.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ## Model Summary
2
+
3
+ NextTerm-47M is a pretrained transformer w/ 47.2M parameters, trained on 1.9 billion tokens of augmented data from the On-Line Encyclopedia of Integer Sequences (OEIS). It is designed to predict the next term in integer sequences. It displays exceptional in-context learning capabilities, and outperforms far larger generic LLMs on OEIS sequence completion tasks. It supports MLX and HuggingFace transformers.
4
+
5
+ The model is pretrained on sequences of up to length 1024, but can potentially generalize to longer sequences (this has not been extensively tested). All pretraining was done on a single RunPod H100 using MLX's CUDA backend. It is pretrained in `float32`. The model was trained for 335,000 steps with a batch size of 32 sequences, using Muon w/ a learning rate of 1e-2 and a fallback optimizer of AdamW with a learning rate of 1e-4. It uses the Qwen3 architecture with 12 layers, a model dimension of 512, 8 attention heads, and a feedforward dimension of 2048. It was trained using an estimated 85 exaFLOPs.
6
+
7
+ The model's tokenizer accepts integer sequences formatted as comma-separated values, e.g. "1,-2,3,-4,". The model outputs the next terms in the sequence in the same format. All non-digit, comma, or negative sign characters are ignored by the tokenizer. Note the model has not been trained on numbers with leading zeros, so inputs like "01,02,03," may yield unpredictable results as they are out-of-distribution. The model tokenizes digits individually, so larger integers will be represented by multiple tokens (e.g. "123" is tokenized as "1", "2", "3"). This means there is no magnitude limit on the integers the model can handle, but longer integers will consume more of the model's context window.
8
+
9
+ #### Inference Examples
10
+
11
+ Successful predictions by NextTerm-47M on various integer sequences:
12
+ ```
13
+ 1,2,3,4,5, -> 6,7,8,9,10,11,12,...
14
+ 3,2,5,3,2,7,5,3,2,11,7,5,3, -> 2,13,11,7,5,3,2,17,...
15
+ 10001,10010,10100, -> 11000,100001,100010,100100,101000,110000,1000001,...
16
+ 123,456,789,101112, -> 131415,161718,192021,222324,252627,282930,...
17
+ 3,9,4,16,5, -> 25,6,36,7,49,8,64,...
18
+ ```
19
+
20
+ Not all predictions are successful; example failure case below:
21
+ ```
22
+ 3,9,7,49,8,64,10,100,15, -> 121,14,144,16,169,21,196,... (fails to identify square in-context)
23
+ ```
24
+
25
+ ## Evaluation Results
26
+
27
+ [![Radar Chart](radar_chart.png)](radar_chart.png)
28
+
29
+ ### Arithmetic Evaluation
30
+
31
+ The arithmetic evaluation consists of predicting the next term in sequences generated by polynomial functions of varying degrees (arithmetic, quadratic, cubic, quartic), across varying shot counts. The models are evaluated based on the accuracy of their predictions, w/ exact-match. NextTerm-47M outperforms all Qwen models <4B, though larger Qwen models do better on lower-degree polynomials.
32
+
33
+ | Model | Arithmetic | Quadratic | Cubic | Quartic |
34
+ |-------|------------|-----------|-------|---------|
35
+ | **NextTerm-47M** | 94.15% | 81.07% | 37.43% | **15.17%** |
36
+ | Qwen3-0.6B | 90.31% | 8.72% | 0.30% | 0.02% |
37
+ | Qwen3-1.7B | 93.10% | 41.57% | 5.36% | 0.71% |
38
+ | Qwen3-4B | 93.90% | 77.26% | 28.18% | 5.98% |
39
+ | Qwen3-8B | **96.10%** | 80.59% | 32.93% | 7.95% |
40
+ | Qwen3-14B | 95.60% | **84.61%** | **49.16%** | 14.98% |
41
+
42
+ ### OEIS Evaluation
43
+
44
+ This consists of predicting the next term in real OEIS sequences from a held-out, decontaminated validation set. Note that the NextTerm model is the only one guaranteed to not have seen any of these sequences during training. NextTerm-47M outperforms all Qwen models, including the largest 14B parameter model.
45
+
46
+ | Model | Accuracy |
47
+ |-------|----------|
48
+ | **NextTerm-47M**| **29.06%** |
49
+ | Qwen3-0.6B | 20.53% |
50
+ | Qwen3-1.7B | 22.65% |
51
+ | Qwen3-4B | 24.95% |
52
+ | Qwen3-8B | 25.78% |
53
+ | Qwen3-14B | 26.78% |
54
+
55
+ ### [Ryskina & Knight (2021)](https://arxiv.org/abs/2109.07230) Sequence Completion
56
+
57
+ This evaluation consists of predicting the next term in 57 sequences from Ryskina & Knight (2021). These sequences were compiled to evaluate OEIS-style sequence completion. NextTerm-47M outperforms all other models. Note that this evaluation has not been decontaminated for Qwen or NextTerm-47M, so some sequences may have been seen during training by some models.
58
+
59
+ | Model | Accuracy |
60
+ |-------|----------|
61
+ | **NextTerm-47M** | **70.18%** |
62
+ | OEIS-LSTM | 5% |
63
+ | GPT-2 | 7% |
64
+ | OEIS Lookup| 53% |
65
+ | Qwen3-0.6B | 33.33% |
66
+ | Qwen3-1.7B | 49.12% |
67
+ | Qwen3-4B | 63.16% |
68
+ | Qwen3-8B | 57.89% |
69
+ | Qwen3-14B | 59.65% |
70
+
71
+ ### Reproducibility
72
+
73
+ The code used to evaluate the model on these benchmarks can be found [here](https://github.com/N8python/NextTerm). This repository includes the evaluation scripts, as well as the script `enhance_data.py` used to generate the augmented training data from OEIS. Augmentations include affine transformations, partial summation, subsampling, permuting, etc. The enhanced dataset itself can be found at [N8Programs/oeis-enhanced](https://huggingface.co/datasets/N8Programs/oeis-enhanced/blob/main/oeis_train_uber_shuffled.jsonl), in the `oeis_train_uber_shuffled.jsonl` file.
74
+
75
+ Note that of all checkpoints produced during training, the one with the lowest validation loss was used for this model card and the evaluation results above. This may slightly inflate the reported performance, but the difference is expected to be minimal. For transparency and testing, all checkpoints produced during training are available at [N8Programs/NextTerm-47M-Checkpts](https://huggingface.co/N8Programs/NextTerm-47M-Checkpts).
76
+
77
+ ## Usage
78
+
79
+ MLX:
80
+
81
+ ```bash
82
+ mlx_lm.generate --model N8Programs/N8Programs/NextTerm-47M-Checkpts --prompt "1,2,3"
83
+ ```
84
+
85
+ HuggingFace Transformers:
86
+
87
+ ```python
88
+ from transformers import AutoModelForCausalLM, AutoTokenizer
89
+
90
+ # Load the model and tokenizer
91
+ model_name = "N8Programs/NextTerm-47M"
92
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
93
+ model = AutoModelForCausalLM.from_pretrained(model_name)
94
+
95
+ # Prompt the model
96
+ prompt = "1,2,3,4,5,"
97
+ inputs = tokenizer(prompt, return_tensors="pt")
98
+ # Generate output
99
+ outputs = model.generate(inputs["input_ids"], max_new_tokens=5000)
100
+ result = tokenizer.decode(outputs[0], skip_special_tokens=True)
101
+
102
+ print(f"Prompt: {prompt}")
103
+ print(f"Output: {result}")
104
+ ```
105
+
106
+
107
+
108
+ ## Attribution
109
+
110
+ This model and dataset were trained and created using data from the
111
+ **On-Line Encyclopedia of Integer Sequences (OEIS)**.
112
+
113
+ - Source: https://oeis.org/
114
+ - License: Creative Commons Attribution-ShareAlike 4.0 (CC BY-SA 4.0)
115
+ - OEIS End-User License Agreement: https://oeis.org/wiki/The_OEIS_End-User_License_Agreement
config.json ADDED
@@ -0,0 +1,17 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "model_type": "qwen3",
3
+ "hidden_size": 512,
4
+ "num_hidden_layers": 12,
5
+ "intermediate_size": 2048,
6
+ "num_attention_heads": 8,
7
+ "rms_norm_eps": 1e-06,
8
+ "vocab_size": 15,
9
+ "num_key_value_heads": 4,
10
+ "max_position_embeddings": 2048,
11
+ "rope_theta": 10000,
12
+ "head_dim": 64,
13
+ "tie_word_embeddings": false,
14
+ "bos_token_id": 12,
15
+ "eos_token_id": 13,
16
+ "pad_token_id": 14
17
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:89ac2735c2506cead08ba6b04b161619f6cd640c1b4b15b3eac55af09e9ea4f4
3
+ size 188877179
radar_chart.png ADDED

Git LFS Details

  • SHA256: ea7b6d9dfc3a44430fd325485128f095167f5011cff667c8c63705ec21e07649
  • Pointer size: 131 Bytes
  • Size of remote file: 404 kB
special_tokens_map.json ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": "<bos>",
3
+ "eos_token": "<eos>",
4
+ "pad_token": "<pad>",
5
+ "unk_token": "<pad>"
6
+ }
tokenizer.json ADDED
@@ -0,0 +1,149 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "version": "1.0",
3
+ "truncation": null,
4
+ "padding": null,
5
+ "added_tokens": [
6
+ {
7
+ "id": 12,
8
+ "content": "<bos>",
9
+ "single_word": false,
10
+ "lstrip": false,
11
+ "rstrip": false,
12
+ "normalized": false,
13
+ "special": true
14
+ },
15
+ {
16
+ "id": 13,
17
+ "content": "<eos>",
18
+ "single_word": false,
19
+ "lstrip": false,
20
+ "rstrip": false,
21
+ "normalized": false,
22
+ "special": true
23
+ },
24
+ {
25
+ "id": 14,
26
+ "content": "<pad>",
27
+ "single_word": false,
28
+ "lstrip": false,
29
+ "rstrip": false,
30
+ "normalized": false,
31
+ "special": true
32
+ }
33
+ ],
34
+ "normalizer": {
35
+ "type": "Sequence",
36
+ "normalizers": [
37
+ {
38
+ "type": "Replace",
39
+ "pattern": {
40
+ "Regex": "[^0-9,-]+"
41
+ },
42
+ "content": ""
43
+ },
44
+ {
45
+ "type": "Replace",
46
+ "pattern": {
47
+ "Regex": ",+"
48
+ },
49
+ "content": ","
50
+ },
51
+ {
52
+ "type": "Strip",
53
+ "strip_left": true,
54
+ "strip_right": true
55
+ },
56
+ {
57
+ "type": "Replace",
58
+ "pattern": {
59
+ "Regex": "^,+"
60
+ },
61
+ "content": ""
62
+ },
63
+ { "type": "Replace", "pattern": { "Regex": ",{2,}$" }, "content": "," }
64
+ ]
65
+ },
66
+ "pre_tokenizer": {
67
+ "type": "Split",
68
+ "pattern": {
69
+ "Regex": ""
70
+ },
71
+ "behavior": "Isolated",
72
+ "invert": false
73
+ },
74
+ "post_processor": {
75
+ "type": "TemplateProcessing",
76
+ "single": [
77
+ {
78
+ "SpecialToken": {
79
+ "id": "<bos>",
80
+ "type_id": 0
81
+ }
82
+ },
83
+ {
84
+ "Sequence": {
85
+ "id": "A",
86
+ "type_id": 0
87
+ }
88
+ }
89
+ ],
90
+ "pair": [
91
+ {
92
+ "Sequence": {
93
+ "id": "A",
94
+ "type_id": 0
95
+ }
96
+ },
97
+ {
98
+ "Sequence": {
99
+ "id": "B",
100
+ "type_id": 1
101
+ }
102
+ }
103
+ ],
104
+ "special_tokens": {
105
+ "<bos>": {
106
+ "id": "<bos>",
107
+ "ids": [
108
+ 12
109
+ ],
110
+ "tokens": [
111
+ "<bos>"
112
+ ]
113
+ }
114
+ }
115
+ },
116
+ "decoder": {
117
+ "type": "Sequence",
118
+ "decoders": [
119
+ {
120
+ "type": "Replace",
121
+ "pattern": {
122
+ "String": " "
123
+ },
124
+ "content": ""
125
+ }
126
+ ]
127
+ },
128
+ "model": {
129
+ "type": "WordLevel",
130
+ "vocab": {
131
+ "0": 0,
132
+ "1": 1,
133
+ "2": 2,
134
+ "3": 3,
135
+ "4": 4,
136
+ "5": 5,
137
+ "6": 6,
138
+ "7": 7,
139
+ "8": 8,
140
+ "9": 9,
141
+ "-": 10,
142
+ ",": 11,
143
+ "<bos>": 12,
144
+ "<eos>": 13,
145
+ "<pad>": 14
146
+ },
147
+ "unk_token": "<pad>"
148
+ }
149
+ }
tokenizer_config.json ADDED
@@ -0,0 +1,36 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "12": {
4
+ "content": "<bos>",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "13": {
12
+ "content": "<eos>",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "14": {
20
+ "content": "<pad>",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ }
27
+ },
28
+ "bos_token": "<bos>",
29
+ "clean_up_tokenization_spaces": false,
30
+ "eos_token": "<eos>",
31
+ "extra_special_tokens": {},
32
+ "model_max_length": 2048,
33
+ "pad_token": "<pad>",
34
+ "tokenizer_class": "PreTrainedTokenizerFast",
35
+ "unk_token": "<pad>"
36
+ }