Add model card metadata: pipeline tag, library name, link to paper, and link to code repository.

#1
by nielsr HF staff - opened
Files changed (1) hide show
  1. README.md +10 -0
README.md CHANGED
@@ -1,3 +1,9 @@
 
 
 
 
 
 
1
  From scratch pretraining on english only no synthetic data, no code, 3 epochs of 1 gig of data for the ~125M param model.
2
 
3
  Test network using [Tensor Product Attention](https://arxiv.org/abs/2501.06425). Other than some alterations to the attention, such as 16 heads insted of 9 and using TPA, this is the same setup as https://huggingface.co/HuggingFaceTB/SmolLM2-135M-Instruct
@@ -21,3 +27,7 @@ One of the primary reported benefits for TPA are for inference which are not rea
21
  - Final Train Perplexity: 20.95
22
 
23
  ![image/png](https://cdn-uploads.huggingface.co/production/uploads/637f3b03932a61b89aefbf5c/8iTSQFvwgbn5or6LdNT9G.png)
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ library_name: transformers
4
+ pipeline_tag: text-generation
5
+ ---
6
+
7
  From scratch pretraining on english only no synthetic data, no code, 3 epochs of 1 gig of data for the ~125M param model.
8
 
9
  Test network using [Tensor Product Attention](https://arxiv.org/abs/2501.06425). Other than some alterations to the attention, such as 16 heads insted of 9 and using TPA, this is the same setup as https://huggingface.co/HuggingFaceTB/SmolLM2-135M-Instruct
 
27
  - Final Train Perplexity: 20.95
28
 
29
  ![image/png](https://cdn-uploads.huggingface.co/production/uploads/637f3b03932a61b89aefbf5c/8iTSQFvwgbn5or6LdNT9G.png)
30
+
31
+ # Code
32
+
33
+ The code is available at: https://github.com/tensorgi/T6.