reproducing DeepSeek R1 Zero with Qwen2.5-0.5B on two 4090 GPUs
rasdani
rasdani
AI & ML interests
None yet
Recent Activity
updated
a collection
4 days ago
smolR1
updated
a collection
4 days ago
smolR1
updated
a model
4 days ago
rasdani/smolR1-Qwen2.5-0.5B
Organizations
Collections
1
Papers
1
models
21

rasdani/smolR1-Qwen2.5-0.5B
Text Generation
•
Updated
•
13

rasdani/Qwen2.5-0.5B-simpleRL-Zoo
Text Generation
•
Updated
•
58

rasdani/Qwen2.5-0.5B-simpleRL-Zoo-no-KL
Updated

rasdani/Qwen2.5-0.5B-simpleRL-Zoo-3072k
Updated

rasdani/Qwen2.5-0.5B-simpleRL-Zoo-4096k
Updated

rasdani/Qwen2.5-0.5B-simpleRL-Zoo-2560k
Updated

rasdani/Qwen2.5-0.5B-simpleRL-Zoo-2048k
Updated

rasdani/Qwen2.5-0.5B-simpleRL-Zoo-first-try
Updated
•
2

rasdani/Qwen-1.5B-Distill-GRPO
Text Generation
•
Updated
•
3

rasdani/Qwen-0.5B-Instruct-GRPO
Updated
datasets
87
rasdani/simplerl_qwen_level1to4
Viewer
•
Updated
•
8.14k
•
58
rasdani/smol-RLVR-IFeval
Viewer
•
Updated
•
15k
•
76
rasdani/countdown
Viewer
•
Updated
•
329k
•
69
rasdani/verifiable-coding-problems-python_decontaminated_fewer_test_cases
Viewer
•
Updated
•
27.8k
•
170
rasdani/cohere-wikipedia-2023-11-en-queries-debug
Viewer
•
Updated
•
10
•
163
rasdani/cohere-wikipedia-2023-11-sv-queries
Viewer
•
Updated
•
1.5k
•
58
•
1
rasdani/cohere-wikipedia-2023-11-no-queries
Viewer
•
Updated
•
1.5k
•
66
rasdani/cohere-wikipedia-2023-11-no-1.5k-articles-positives
Viewer
•
Updated
•
1.5k
•
62
rasdani/cohere-wikipedia-2023-11-no-1.5k-articles
Viewer
•
Updated
•
13.5k
•
85
rasdani/cohere-wikipedia-2023-11-sv-1.5k-articles-positives
Viewer
•
Updated
•
1.5k
•
70