arxiv:2410.12029
Lucy Li
lucy3
AI & ML interests
None yet
Recent Activity
authored
a paper
9 days ago
AboutMe: Using Self-Descriptions in Webpages to Document the Effects of
English Pretraining Data Filters
authored
a paper
9 days ago
Dolma: an Open Corpus of Three Trillion Tokens for Language Model
Pretraining Research
Organizations
datasets
None public yet