Nickyang
/

FastCuRL-1.5B-Preview

Text Generation

text-generation-inference

Model card Files Files and versions Community

FastCuRL-1.5B-Preview / README.md

Nickyang's picture

Update README.md

e61ca9b verified about 1 month ago

|

history blame contribute delete

2.31 kB

	---
	license: mit
	language:
	- en
	base_model:
	- deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B
	pipeline_tag: text-generation
	library_name: transformers
	---

	<div align="center">
	<span style="font-family: default; font-size: 1.5em;">FastCuRL-1.5B-Preview</span>
	</div>

	## FastCuRL Overview

	### 2025-03-17

	We release FastCuRL-1.5B-Preview, a slow-thinking reasoning model that outperforms the previous SoTA DeepScaleR-1.5B-Preview with 50% training steps! We adapt a novel curriculum-guided iterative lengthening reinforcement learning to the DeepSeek-R1-Distill-Qwen-1.5B and observe continuous performance improvement as training steps increase. To better reproduce our work and advance research progress, we open-source our code, model, and data.

	Code: https://github.com/nick7nlp/FastCuRL

	### 2025-03-21

	Paper: https://arxiv.org/abs/2503.17287

	## Key Results

	We report Pass@1 accuracy averaged over 16 samples for each problem.

	\| Model \| AIME 2024 \| MATH 500 \| AMC 2023 \| Minerva Math \| OlympiadBench \| Avg. \|
	\|-------\|-----------\|-----------\|-----------\|--------------\|---------------\|------\|
	\| Qwen2.5-Math-7B-Instruct \| 13.3 \| 79.8 \| 50.6 \| 34.6 \| 40.7 \| 43.8 \|
	\| rStar-Math-7B \| 26.7 \| 78.4 \| 47.5 \| - \| 47.1 \| - \|
	\| Eurus-2-7B-PRIME \| 26.7 \| 79.2 \| 57.8 \| 38.6 \| 42.1 \| 48.9 \|
	\| Qwen2.5-7B-SimpleRL \| 26.7 \| 82.4 \| 62.5 \| <strong>39.7</strong> \| 43.3 \| 50.9 \|
	\| DeepSeek-R1-Distill-Qwen-1.5B \| 28.8 \| 82.8 \| 62.9 \| 26.5 \| 43.3 \| 48.9 \|
	\| Still-1.5B \| 32.5 \| 84.4 \| 66.7 \| 29.0 \| 45.4 \| 51.6 \|
	\| DeepScaleR-1.5B-Preview \| 43.1 \| 87.8 \| 73.6 \| 30.2 \| 50.0 \| 57.0 \|
	\| <strong>FastCuRL-1.5B-Preview</strong> \| <strong>43.1</strong> \| <strong>88.0</strong> \| <strong>74.2</strong> \| 31.6 \| <strong>50.4</strong> \| <strong>57.5</strong> \|

	## Training Data
	Following DeepScaleR, our training dataset consists of 40,315 unique problem-answer pairs compiled from:
	- AIME problems (1984-2023)
	- AMC problems (before 2023)
	- Omni-MATH dataset
	- Still dataset

	## Acknowledgements

	- Our training experiments are powered by our heavily modified fork of [verl](https://github.com/volcengine/verl) and [deepscaler](https://github.com/agentica-project/deepscaler).
	- Our model is trained on top of [`DeepSeek-R1-Distill-Qwen-1.5B`](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B).