Model Card for Model ID
This LoRA adapter was distilled from deepseek-ai/DeepSeek-R1 and uses meta-llama/Llama-3.1-70B-Instruct as a base. Despite being a mere rank-32 LoRA adapter on top of Llama-3.1-70B-Instruct, trained on less than 10k prompt/completion examples, it significantly outperforms the base model, gpt-4o, and typically Claude-3.5-Sonnet on MATH-500, AIME24, and GPQA Diamond, showing that system-2 style thinking can be cheaply trained in very small numbers of parameters.
Model Details
The model was extracted by running prompts (not completions) from the following datasets through DeepSeek-R1 to generate a dataset of prompt/completion pairs using unfat:
- EleutherAI/hendrycks_math (train split)
- PrimeIntellect/verifiable-coding-problems (train split, first 500 rows)
- mlabonne/harmless_alpaca (train split, first 1k rows)
- euclaise/logician (train split, first 1k rows)
- isaiahbjork/cot-logic-reasoning (train split, first 1.4k rows)
We then modestly cleaned the generated dataset by stripping any completions that were missing
closing </think>
tags, and trained for 2 epochs using Together.ai's serverless finetunning
platform. The total cost for extracting the data from glhf.chat's API,
and training the model on Together, was less than $450.
We suspect even better performance could be achieved with larger training sets and improved data cleaning, e.g. formally-verifying R1 outputs and only training on correct answers, as OpenR1-Math-220k attempts.
Model Description
- Developed by: @reissbaker
- Funded by: Synthetic Lab
- License: Apache 2.0
- Finetuned from model: Llama 3.1 70B Instruct
How to Get Started with the Model
Run the model with one click on glhf.chat by copying this repo URL and launching the model.
Eval results
We used open-R1 to evaluate the models, since unlike some other popular eval frameworks, it has successfully reproduced the R1-distill evals from DeepSeek. The results are as follows:
Model | MATH-500 | AIME24 | GPQA Diamond |
---|---|---|---|
reissbaker/r1-llama-70b-distill-lora | 86.8 | 20.0 | 61.6 |
meta-llama/Llama-3.1-70B-Instruct | 56.6 | 10.0 | 45.9 |
gpt-4o-0513 †| 74.6 | 9.3 | 49.9 |
Claude-3.5-Sonnet-1022 †| 78.3 | 16.0 | 65.0 |
deepseek-ai/DeepSeek-R1-Distill-Llama-70B †| 94.5 | 70.0 | 65.2 |
†Eval results reported by DeepSeek
Our LoRA significantly outperforms non-reasoning models on most benchmarks, and is a large improvement over the base Llama-3.1-70B-Instruct model. DeepSeek's full-parameter finetune (on larger datasets) significantly outperforms it on these tasks, however.
Training Hyperparameters
- 4e-4 LR
- Rank 32
- Alpha 16
- 0.01 dropout
- Downloads last month
- 6,511
Model tree for reissbaker/r1-llama-70b-distill-lora
Base model
meta-llama/Llama-3.1-70B