datajuicer
/

LLaMA-1B-dj-refine-150B

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

yxdyc commited on Oct 30, 2023

Commit

bcb620c

•

1 Parent(s): 94ed815

Update README.md

Files changed (1) hide show

README.md +3 -1

README.md CHANGED Viewed

@@ -27,4 +27,6 @@ The model architecture is LLaMA-1.3B and we adopt the [OpenLLaMA](https://github
 The model is pre-trained on 150B tokens of Data-Juicer's refined RedPajama and Pile.
 It achieves an average score of 34.21 over 16 HELM tasks, beating Falcon-1.3B (trained on 350B tokens from RefinedWeb), Pythia-1.4B (trained on 300B tokens from original Pile) and Open-LLaMA-1.3B (trained on 150B tokens from original RedPajama and Pile).
-For more details, please refer to our [paper](https://arxiv.org/abs/2309.02033).

 The model is pre-trained on 150B tokens of Data-Juicer's refined RedPajama and Pile.
 It achieves an average score of 34.21 over 16 HELM tasks, beating Falcon-1.3B (trained on 350B tokens from RefinedWeb), Pythia-1.4B (trained on 300B tokens from original Pile) and Open-LLaMA-1.3B (trained on 150B tokens from original RedPajama and Pile).
+For more details, please refer to our [paper](https://arxiv.org/abs/2309.02033).
+![exp_llama](https://img.alicdn.com/imgextra/i2/O1CN019WtUPP1uhebnDlPR8_!!6000000006069-2-tps-2530-1005.png)