simarora commited on
Commit
8500eb0
1 Parent(s): e872cd3

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +42 -2
README.md CHANGED
@@ -1,3 +1,43 @@
1
  ---
2
- library_name: based
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ datasets:
3
+ - EleutherAI/pile
4
+ language:
5
+ - en
6
+ ---
7
+ # Model Card
8
+
9
+ This model is pretrained Mamba model. The goal of this model is to provide a quality reference for the Based architecture.
10
+
11
+ As a quality reference, we include a pretrained Attention (Llama architecture) model provided here: https://huggingface.co/hazyresearch/attn-1b, and a pretrained Based model provided here: https://huggingface.co/hazyresearch/based-1b
12
+
13
+ All three checkpoints are pretrained on **10Bn tokens** of the Pile in the exact same data order using next token prediction.
14
+
15
+
16
+ ### Model Sources
17
+
18
+ The model implementation and training code that produced the model are provided here: https://github.com/HazyResearch/based
19
+
20
+ ### Uses
21
+
22
+ The purpose of this work is to evaluate the language modeling quality of a new efficient architecture, Based.
23
+
24
+ We include a series of benchmarks that you can use to evaluate quality:
25
+ - FDA: https://huggingface.co/datasets/hazyresearch/based-fda
26
+ - SWDE: https://huggingface.co/datasets/hazyresearch/based-swde
27
+ - SQUAD: https://huggingface.co/datasets/hazyresearch/based-squad
28
+
29
+
30
+ ## Citation
31
+
32
+ Please consider citing this paper if you use our work:
33
+
34
+ ```
35
+ @article{arora2024simple,
36
+ title={Simple linear attention language models balance the recall-throughput tradeoff},
37
+ author={Arora, Simran and Eyuboglu, Sabri and Zhang, Michael and Timalsina, Aman and Alberti, Silas and Zinsley, Dylan and Zou, James and Rudra, Atri and Ré, Christopher},
38
+ journal={arXiv:2402.18668},
39
+ year={2024}
40
+ }
41
+ ```
42
+
43
+ Please reach out to simarora@stanford.edu, eyuboglu@stanford.edu, and mzhang20@stanford.edu with questions.