Datasets and models for EMNLP paper "Scalable Data Ablation Approximations for Language Models through Modular Training and Merging"
Clara Na
claran
AI & ML interests
None yet
Recent Activity
authored
a paper
about 2 months ago
Scalable Data Ablation Approximations for Language Models through
Modular Training and Merging
updated
a dataset
about 2 months ago
claran/modular-s2orc
updated
a collection
about 2 months ago
Scalable Data Ablations
Organizations
Collections
1
Papers
1
models
30
claran/s2orc-biology1994-1999-ind-130m
Updated
•
3
claran/s2orc-biology2007-2008-ind-130m
Updated
•
8
claran/s2orc-biology2013-2013-ind-130m
Updated
•
5
claran/s2orc-biology2021-2021-ind-130m
Updated
•
7
claran/s2orc-biology2019-2019-ind-130m
Updated
•
4
claran/s2orc-biology2000-2003-ind-130m
Updated
•
3
claran/s2orc-biology2015-2015-ind-130m
Updated
•
9
claran/s2orc-biology2014-2014-ind-130m
Updated
•
5
claran/s2orc-biology2004-2006-ind-130m
Updated
•
7
claran/s2orc-biology2016-2016-ind-130m
Updated
•
10
datasets
6
claran/modular-s2orc
Viewer
•
Updated
•
7.47M
•
3.56k
•
2
claran/seed-pretrain-decon
Viewer
•
Updated
•
3.45M
•
151
claran/m2d2-wiki-decon
Viewer
•
Updated
•
5.3M
•
892
claran/seed-pretrain-decon-parquet
Viewer
•
Updated
•
6.61M
•
127
claran/m2d2-wiki-decon-parquet
Viewer
•
Updated
•
10.6M
•
7.43k
claran/modular-s2orc-parquet
Viewer
•
Updated
•
7.47M
•
8.06k
•
1