mosesananta
's Collections
Everything about LLM
updated
The Reversal Curse: LLMs trained on "A is B" fail to learn "B is A"
Paper
•
2309.12288
•
Published
•
3
Are Emergent Abilities in Large Language Models just In-Context
Learning?
Paper
•
2309.01809
•
Published
•
3
When Less is More: Investigating Data Pruning for Pretraining LLMs at
Scale
Paper
•
2309.04564
•
Published
•
14
CulturaX: A Cleaned, Enormous, and Multilingual Dataset for Large
Language Models in 167 Languages
Paper
•
2309.09400
•
Published
•
77
A Paradigm Shift in Machine Translation: Boosting Translation
Performance of Large Language Models
Paper
•
2309.11674
•
Published
•
29
Textbooks Are All You Need II: phi-1.5 technical report
Paper
•
2309.05463
•
Published
•
84
SlimPajama-DC: Understanding Data Combinations for LLM Training
Paper
•
2309.10818
•
Published
•
10
Baichuan 2: Open Large-scale Language Models
Paper
•
2309.10305
•
Published
•
16
FLM-101B: An Open LLM and How to Train It with $100K Budget
Paper
•
2309.03852
•
Published
•
42
Small-scale proxies for large-scale Transformer training instabilities
Paper
•
2309.14322
•
Published
•
17
YaRN: Efficient Context Window Extension of Large Language Models
Paper
•
2309.00071
•
Published
•
57
MetaMath: Bootstrap Your Own Mathematical Questions for Large Language
Models
Paper
•
2309.12284
•
Published
•
16
Vision Transformers Need Registers
Paper
•
2309.16588
•
Published
•
70
Paper
•
2309.16671
•
Published
•
17
TinyStories: How Small Can Language Models Be and Still Speak Coherent
English?
Paper
•
2305.07759
•
Published
•
28
Adapting Large Language Models via Reading Comprehension
Paper
•
2309.09530
•
Published
•
69
Paper
•
2309.16609
•
Published
•
30
Language models in molecular discovery
Paper
•
2309.16235
•
Published
•
10
MADLAD-400: A Multilingual And Document-Level Large Audited Dataset
Paper
•
2309.04662
•
Published
•
21
When can transformers reason with abstract symbols?
Paper
•
2310.09753
•
Published
•
2
MiniGPT-v2: large language model as a unified interface for
vision-language multi-task learning
Paper
•
2310.09478
•
Published
•
15
Llemma: An Open Language Model For Mathematics
Paper
•
2310.10631
•
Published
•
42
Improving Large Language Model Fine-tuning for Solving Math Problems
Paper
•
2310.10047
•
Published
•
5
In-Context Pretraining: Language Modeling Beyond Document Boundaries
Paper
•
2310.10638
•
Published
•
26
Dissecting In-Context Learning of Translations in GPTs
Paper
•
2310.15987
•
Published
•
5