rubbyninja
's Collections
advancing research
updated
STaR: Bootstrapping Reasoning With Reasoning
Paper
•
2203.14465
•
Published
•
8
DeepSeekMoE: Towards Ultimate Expert Specialization in
Mixture-of-Experts Language Models
Paper
•
2401.06066
•
Published
•
43
DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts
Language Model
Paper
•
2405.04434
•
Published
•
14
Prompt Cache: Modular Attention Reuse for Low-Latency Inference
Paper
•
2311.04934
•
Published
•
28
Quiet-STaR: Language Models Can Teach Themselves to Think Before
Speaking
Paper
•
2403.09629
•
Published
•
74
Let's Verify Step by Step
Paper
•
2305.20050
•
Published
•
10
Large Language Monkeys: Scaling Inference Compute with Repeated Sampling
Paper
•
2407.21787
•
Published
•
12
Solving math word problems with process- and outcome-based feedback
Paper
•
2211.14275
•
Published
•
7
Training Language Models to Self-Correct via Reinforcement Learning
Paper
•
2409.12917
•
Published
•
135
Aligning Machine and Human Visual Representations across Abstraction
Levels
Paper
•
2409.06509
•
Published
•
1
GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in
Large Language Models
Paper
•
2410.05229
•
Published
•
21
nGPT: Normalized Transformer with Representation Learning on the
Hypersphere
Paper
•
2410.01131
•
Published
•
9
Paper
•
2303.01469
•
Published
•
8
Simplifying, Stabilizing and Scaling Continuous-Time Consistency Models
Paper
•
2410.11081
•
Published
•
19
Scaling Laws for Precision
Paper
•
2411.04330
•
Published
•
6
The Surprising Effectiveness of Test-Time Training for Abstract
Reasoning
Paper
•
2411.07279
•
Published
•
3
Test-Time Training with Self-Supervision for Generalization under
Distribution Shifts
Paper
•
1909.13231
•
Published
•
1
Better & Faster Large Language Models via Multi-token Prediction
Paper
•
2404.19737
•
Published
•
73
O1 Replication Journey: A Strategic Progress Report -- Part 1
Paper
•
2410.18982
•
Published
•
1
O1 Replication Journey -- Part 2: Surpassing O1-preview through Simple
Distillation, Big Progress or Bitter Lesson?
Paper
•
2411.16489
•
Published
•
40
ReFT: Reasoning with Reinforced Fine-Tuning
Paper
•
2401.08967
•
Published
•
29
Paper
•
2408.02666
•
Published
•
27