Hugging Face
Models
Datasets
Spaces
Posts
Docs
Solutions
Pricing
Log In
Sign Up
tjadamlee
's Collections
Speech Data
text-pretrain-data
text-intruction-data
visual-data
text-pretrain-data
updated
Feb 21
some pretrain dataset for LLM
Upvote
4
allenai/MADLAD-400
Updated
Oct 31, 2023
•
36
•
115
CASIA-LM/ChineseWebText
Viewer
•
Updated
Nov 13, 2023
•
1k
•
11
•
32
allenai/dolma
Updated
Apr 17
•
1.33k
•
763
allenai/peS2o
Updated
Nov 28, 2023
•
442
•
133
Skywork/SkyPile-150B
Viewer
•
Updated
Dec 7, 2023
•
1.76M
•
93
•
304
wenge-research/yayi2_pretrain_data
Viewer
•
Updated
Dec 29, 2023
•
1.68M
•
30
•
50
togethercomputer/RedPajama-Data-V2
Viewer
•
Updated
Jan 18
•
2.25M
•
5.93k
•
329
tiiuae/falcon-refinedweb
Viewer
•
Updated
Jun 20, 2023
•
968M
•
1.83k
•
776
togethercomputer/RedPajama-Data-1T
Viewer
•
Updated
Jun 17
•
1.73M
•
2.66k
•
1.03k
cerebras/SlimPajama-627B
Viewer
•
Updated
Jul 7, 2023
•
2.16M
•
5.24k
•
398
Tele-AI/TeleChat-PTD
Updated
Mar 20
•
5
•
158
open-web-math/open-web-math
Viewer
•
Updated
Oct 17, 2023
•
6.32M
•
763
•
243
GAIR/MathPile
Preview
•
Updated
Jun 13
•
392
•
171
EleutherAI/proof-pile-2
Updated
Oct 25, 2023
•
502
•
159
vietgpt/open-web-math
Viewer
•
Updated
Nov 21, 2023
•
6.32M
•
2
meta-math/MetaMathQA
Viewer
•
Updated
Dec 21, 2023
•
395k
•
7.09k
•
277
bigcode/the-stack-dedup
Viewer
•
Updated
Aug 17, 2023
•
237M
•
1.92k
•
316
bigcode/the-stack
Viewer
•
Updated
Apr 13, 2023
•
546M
•
1.22k
•
711
nampdn-ai/tiny-strange-textbooks
Viewer
•
Updated
Feb 2
•
1M
•
126
•
83
uonlp/CulturaX
Viewer
•
Updated
4 days ago
•
7.18B
•
26.6k
•
439
Locutusque/UltraTextbooks
Viewer
•
Updated
Feb 2
•
5.52M
•
54
•
187
HuggingFaceTB/cosmopedia
Viewer
•
Updated
Apr 16
•
31.1M
•
8.03k
•
530
Upvote
4
Share collection
View history
Collection guide
Browse collections