ron-wolf
's Collections
Reading list
updated
No More Adam: Learning Rate Scaling at Initialization is All You Need
Paper
•
2412.11768
•
Published
•
44
TheAgentCompany: Benchmarking LLM Agents on Consequential Real World
Tasks
Paper
•
2412.14161
•
Published
•
52
HiRED: Attention-Guided Token Dropping for Efficient Inference of
High-Resolution Vision-Language Models in Resource-Constrained Environments
Paper
•
2408.10945
•
Published
•
11
PDFTriage: Question Answering over Long, Structured Documents
Paper
•
2309.08872
•
Published
•
54
Compressed Chain of Thought: Efficient Reasoning Through Dense
Representations
Paper
•
2412.13171
•
Published
•
36
The Matrix Calculus You Need For Deep Learning
Paper
•
1802.01528
•
Published
•
2
A Modern Self-Referential Weight Matrix That Learns to Modify Itself
Paper
•
2202.05780
•
Published
Recurrent Memory Transformer
Paper
•
2207.06881
•
Published
•
1
How many words does ChatGPT know? The answer is ChatWords
Paper
•
2309.16777
•
Published
•
1
Weaver: Foundation Models for Creative Writing
Paper
•
2401.17268
•
Published
•
46
Graph of Thoughts: Solving Elaborate Problems with Large Language Models
Paper
•
2308.09687
•
Published
•
7
SequenceMatch: Imitation Learning for Autoregressive Sequence Modelling
with Backtracking
Paper
•
2306.05426
•
Published
Think before you speak: Training Language Models With Pause Tokens
Paper
•
2310.02226
•
Published
•
2
What do tokens know about their characters and how do they know it?
Paper
•
2206.02608
•
Published
Leave No Context Behind: Efficient Infinite Context Transformers with
Infini-attention
Paper
•
2404.07143
•
Published
•
110
Sketch-of-Thought: Efficient LLM Reasoning with Adaptive
Cognitive-Inspired Sketching
Paper
•
2503.05179
•
Published
•
46
Expressing stigma and inappropriate responses prevents LLMs from safely
replacing mental health providers
Paper
•
2504.18412
•
Published
•
1
Chain of Draft: Thinking Faster by Writing Less
Paper
•
2502.18600
•
Published
•
50
Outlier-Safe Pre-Training for Robust 4-Bit Quantization of Large
Language Models
Paper
•
2506.19697
•
Published
•
44
Jasper and Stella: distillation of SOTA embedding models
Paper
•
2412.19048
•
Published
•
1
The Flan Collection: Designing Data and Methods for Effective
Instruction Tuning
Paper
•
2301.13688
•
Published
•
9
Puzzle: Distillation-Based NAS for Inference-Optimized LLMs
Paper
•
2411.19146
•
Published
•
18
Chain-of-Thought Reasoning Without Prompting
Paper
•
2402.10200
•
Published
•
110
Robust and Fine-Grained Detection of AI Generated Texts
Paper
•
2504.11952
•
Published
•
12
Does Math Reasoning Improve General LLM Capabilities? Understanding
Transferability of LLM Reasoning
Paper
•
2507.00432
•
Published
•
59