simarora commited on
Commit
49bfe13
1 Parent(s): c18493d

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +48 -0
README.md ADDED
@@ -0,0 +1,48 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ datasets:
3
+ - EleutherAI/pile
4
+ language:
5
+ - en
6
+ ---
7
+ # Model Card
8
+
9
+ This model is pretrained as a reference baseline to the Based model provided here: https://huggingface.co/hazyresearch/based-1b-50b.
10
+
11
+ Both checkpoints are pretrained on **50Bn tokens** of the Pile in the exact same data order using next token prediction.
12
+
13
+
14
+ ### Model Sources
15
+
16
+ The model is a standard Transformer model, using the Llama architecture (Rotary encodings, SwiGLU, RMS Norm, etc.)
17
+
18
+ The training code is provided here and can be used to reproduce training: https://github.com/HazyResearch/based
19
+
20
+ The paper for the work is here, and the appendix includes additional experimental details/hyperparameters: https://arxiv.org/abs/2402.18668
21
+
22
+
23
+ ### Uses
24
+
25
+ The purpose of this work is to evaluate the language modeling quality of a new efficient architecture, Based.
26
+
27
+ We include a series of benchmarks that you can use to evaluate quality:
28
+ - FDA: https://huggingface.co/datasets/hazyresearch/based-fda
29
+ - SWDE: https://huggingface.co/datasets/hazyresearch/based-swde
30
+ - SQUAD: https://huggingface.co/datasets/hazyresearch/based-squad
31
+
32
+
33
+
34
+
35
+ ## Citation
36
+
37
+ Please consider citing this paper if you use our work:
38
+
39
+ ```
40
+ @article{arora2024simple,
41
+ title={Simple linear attention language models balance the recall-throughput tradeoff},
42
+ author={Arora, Simran and Eyuboglu, Sabri and Zhang, Michael and Timalsina, Aman and Alberti, Silas and Zinsley, Dylan and Zou, James and Rudra, Atri and Ré, Christopher},
43
+ journal={arXiv:2402.18668},
44
+ year={2024}
45
+ }
46
+ ```
47
+
48
+ Please reach out to simarora@stanford.edu, eyuboglu@stanford.edu, and mzhang20@stanford.edu with questions.