Text Generation
Transformers
GGUF
English
conversational
Mungert commited on
Commit
6959e0c
·
verified ·
0 Parent(s):

Super-squash history to reclaim storage

Browse files
.gitattributes ADDED
@@ -0,0 +1,70 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ *.7z filter=lfs diff=lfs merge=lfs -text
2
+ *.arrow filter=lfs diff=lfs merge=lfs -text
3
+ *.bin filter=lfs diff=lfs merge=lfs -text
4
+ *.bz2 filter=lfs diff=lfs merge=lfs -text
5
+ *.ckpt filter=lfs diff=lfs merge=lfs -text
6
+ *.ftz filter=lfs diff=lfs merge=lfs -text
7
+ *.gz filter=lfs diff=lfs merge=lfs -text
8
+ *.h5 filter=lfs diff=lfs merge=lfs -text
9
+ *.joblib filter=lfs diff=lfs merge=lfs -text
10
+ *.lfs.* filter=lfs diff=lfs merge=lfs -text
11
+ *.mlmodel filter=lfs diff=lfs merge=lfs -text
12
+ *.model filter=lfs diff=lfs merge=lfs -text
13
+ *.msgpack filter=lfs diff=lfs merge=lfs -text
14
+ *.npy filter=lfs diff=lfs merge=lfs -text
15
+ *.npz filter=lfs diff=lfs merge=lfs -text
16
+ *.onnx filter=lfs diff=lfs merge=lfs -text
17
+ *.ot filter=lfs diff=lfs merge=lfs -text
18
+ *.parquet filter=lfs diff=lfs merge=lfs -text
19
+ *.pb filter=lfs diff=lfs merge=lfs -text
20
+ *.pickle filter=lfs diff=lfs merge=lfs -text
21
+ *.pkl filter=lfs diff=lfs merge=lfs -text
22
+ *.pt filter=lfs diff=lfs merge=lfs -text
23
+ *.pth filter=lfs diff=lfs merge=lfs -text
24
+ *.rar filter=lfs diff=lfs merge=lfs -text
25
+ *.safetensors filter=lfs diff=lfs merge=lfs -text
26
+ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
27
+ *.tar.* filter=lfs diff=lfs merge=lfs -text
28
+ *.tar filter=lfs diff=lfs merge=lfs -text
29
+ *.tflite filter=lfs diff=lfs merge=lfs -text
30
+ *.tgz filter=lfs diff=lfs merge=lfs -text
31
+ *.wasm filter=lfs diff=lfs merge=lfs -text
32
+ *.xz filter=lfs diff=lfs merge=lfs -text
33
+ *.zip filter=lfs diff=lfs merge=lfs -text
34
+ *.zst filter=lfs diff=lfs merge=lfs -text
35
+ *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ DeepCoder-1.5B-Preview-f16.gguf filter=lfs diff=lfs merge=lfs -text
37
+ DeepCoder-1.5B-Preview-f16_q8_0.gguf filter=lfs diff=lfs merge=lfs -text
38
+ DeepCoder-1.5B-Preview-bf16_q8_0.gguf filter=lfs diff=lfs merge=lfs -text
39
+ DeepCoder-1.5B-Preview-f16_q6_k.gguf filter=lfs diff=lfs merge=lfs -text
40
+ DeepCoder-1.5B-Preview-bf16_q6_k.gguf filter=lfs diff=lfs merge=lfs -text
41
+ DeepCoder-1.5B-Preview-f16_q4_k.gguf filter=lfs diff=lfs merge=lfs -text
42
+ DeepCoder-1.5B-Preview-bf16_q4_k.gguf filter=lfs diff=lfs merge=lfs -text
43
+ DeepCoder-1.5B-Preview-q3_k_l.gguf filter=lfs diff=lfs merge=lfs -text
44
+ DeepCoder-1.5B-Preview-q4_k_l.gguf filter=lfs diff=lfs merge=lfs -text
45
+ DeepCoder-1.5B-Preview-q5_k_l.gguf filter=lfs diff=lfs merge=lfs -text
46
+ DeepCoder-1.5B-Preview-q6_k_l.gguf filter=lfs diff=lfs merge=lfs -text
47
+ DeepCoder-1.5B-Preview-q3_k_m.gguf filter=lfs diff=lfs merge=lfs -text
48
+ DeepCoder-1.5B-Preview-q3_k_s.gguf filter=lfs diff=lfs merge=lfs -text
49
+ DeepCoder-1.5B-Preview-q4_k_m.gguf filter=lfs diff=lfs merge=lfs -text
50
+ DeepCoder-1.5B-Preview-q4_k_s.gguf filter=lfs diff=lfs merge=lfs -text
51
+ DeepCoder-1.5B-Preview-q5_k_m.gguf filter=lfs diff=lfs merge=lfs -text
52
+ DeepCoder-1.5B-Preview-q5_k_s.gguf filter=lfs diff=lfs merge=lfs -text
53
+ DeepCoder-1.5B-Preview-q6_k_m.gguf filter=lfs diff=lfs merge=lfs -text
54
+ DeepCoder-1.5B-Preview-q8_0.gguf filter=lfs diff=lfs merge=lfs -text
55
+ DeepCoder-1.5B-Preview-q4_0.gguf filter=lfs diff=lfs merge=lfs -text
56
+ DeepCoder-1.5B-Preview-q4_1.gguf filter=lfs diff=lfs merge=lfs -text
57
+ DeepCoder-1.5B-Preview-q4_0_l.gguf filter=lfs diff=lfs merge=lfs -text
58
+ DeepCoder-1.5B-Preview-q4_1_l.gguf filter=lfs diff=lfs merge=lfs -text
59
+ DeepCoder-1.5B-Preview-q5_0.gguf filter=lfs diff=lfs merge=lfs -text
60
+ DeepCoder-1.5B-Preview-q5_1.gguf filter=lfs diff=lfs merge=lfs -text
61
+ DeepCoder-1.5B-Preview-q5_0_l.gguf filter=lfs diff=lfs merge=lfs -text
62
+ DeepCoder-1.5B-Preview-q5_1_l.gguf filter=lfs diff=lfs merge=lfs -text
63
+ DeepCoder-1.5B-Preview-iq3_xs.gguf filter=lfs diff=lfs merge=lfs -text
64
+ DeepCoder-1.5B-Preview-iq3_xxs.gguf filter=lfs diff=lfs merge=lfs -text
65
+ DeepCoder-1.5B-Preview-iq3_s.gguf filter=lfs diff=lfs merge=lfs -text
66
+ DeepCoder-1.5B-Preview-iq3_m.gguf filter=lfs diff=lfs merge=lfs -text
67
+ DeepCoder-1.5B-Preview-iq4_xs.gguf filter=lfs diff=lfs merge=lfs -text
68
+ DeepCoder-1.5B-Preview-bf16.gguf filter=lfs diff=lfs merge=lfs -text
69
+ DeepCoder-1.5B-Preview.imatrix filter=lfs diff=lfs merge=lfs -text
70
+ DeepCoder-1.5B-Preview-iq4_nl.gguf filter=lfs diff=lfs merge=lfs -text
DeepCoder-1.5B-Preview-bf16.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5f6b30a828d33b8d585d1801d1a2e4e1093109c5f1f435c7b8529a4cd149c9db
3
+ size 3560417472
DeepCoder-1.5B-Preview-bf16_q8_0.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d8f93f925515eaf1fefd742aa579e2882807418446eedff97e68422141b399b2
3
+ size 2332108992
DeepCoder-1.5B-Preview-f16_q8_0.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:831434f1537421e110dd9ed61011681e3f15cdda10aecb84181494aeab854b24
3
+ size 2332108992
DeepCoder-1.5B-Preview-iq3_m.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:644c25b7a7d2a773e2c9d6497df1f16339cf9b7d16bd0defa9b875cd11c5929b
3
+ size 906114528
DeepCoder-1.5B-Preview-iq3_s.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c403621bf91a9ae2d4b2b6260829381720f3d943d9a66e4e41b9fc244f5941e1
3
+ size 891857376
DeepCoder-1.5B-Preview-iq3_xs.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:90d470df9fb770bed5ba8f7eef32c34ff8b79ca1953662ce195766fea6729000
3
+ size 861149664
DeepCoder-1.5B-Preview-iq3_xxs.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0e00774d75692715f302a120bc572b2087ba1148191f94ab2cad9892e7e5f0c5
3
+ size 829237728
DeepCoder-1.5B-Preview-iq4_nl.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:24591cd4789376c3c98f363938c59f8fdaa20237ac7afb9574bf48992a622e5e
3
+ size 1067604960
DeepCoder-1.5B-Preview-iq4_xs.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:54650b029ded0f24dd097bc4a68461843d1f9aa3de60e2f03259dded647e7f07
3
+ size 1019712480
DeepCoder-1.5B-Preview-q3_k_m.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e49789647cbb9d59370c6ff4d97e1038edabafb089e1b7c18369b83310478a28
3
+ size 1015619040
DeepCoder-1.5B-Preview-q3_k_s.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e285a6ab6ce15a480e648dcc8e59c8c09bb1a276465859121ae046c2a750cbf0
3
+ size 890395104
DeepCoder-1.5B-Preview-q4_0.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3d1281e30d586889db3f34131ab4435adb4dfdd25d6a0b597e614cfb72703e24
3
+ size 1006062048
DeepCoder-1.5B-Preview-q4_1.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:074d50de02dea66145e26f7e53410e9313e13b6511b4680760a632ed977425db
3
+ size 1117120992
DeepCoder-1.5B-Preview-q4_k_m.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b0835177cc212075995fd9eea1043f32bcbb33364773485ba60353e7ec29e383
3
+ size 1177488864
DeepCoder-1.5B-Preview-q4_k_s.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a5480f38e6e717a2c423415d9884125e2f67ba88c6828348760076166a774db1
3
+ size 1131752928
DeepCoder-1.5B-Preview-q5_0.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:96f291d059598823d326727d0786a535cdd3fdea02fe915ee1c805c019496de2
3
+ size 1228179936
DeepCoder-1.5B-Preview-q5_1.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:afa9f5e710a808f8ce4afb23d38bd1fc9d5edfdffc92a79f28667cb1c50d029a
3
+ size 1339238880
DeepCoder-1.5B-Preview-q5_k_m.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:767192b818d1b4e2f8b91cfb49d29ceea453e5f63e9d38ad4b166b55c7afb56f
3
+ size 1316490720
DeepCoder-1.5B-Preview-q5_k_s.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:bc790a2ea07fb1104a8d6e8a589dc86be2c499f1a8b7ede09b15ac9f3718f313
3
+ size 1290169824
DeepCoder-1.5B-Preview-q6_k_m.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:01d45f0a9e9a768222deecce8d7eda7d33b6d226034dd1f20846f2cac316c77e
3
+ size 1464180192
DeepCoder-1.5B-Preview-q8_0.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0c40b29c16b571e5c41696bcf990634a3422eae530a57c5e4788b0e6cd1e2fef
3
+ size 1894533312
DeepCoder-1.5B-Preview.imatrix ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0f1216f0e1904bcd2878a1e8c4146c35d99966e67e2212b632af942cb0ffc941
3
+ size 2042214
README.md ADDED
@@ -0,0 +1,112 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ library_name: transformers
4
+ datasets:
5
+ - PrimeIntellect/verifiable-coding-problems
6
+ - likaixin/TACO-verified
7
+ - livecodebench/code_generation_lite
8
+ language:
9
+ - en
10
+ base_model:
11
+ - deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B
12
+ pipeline_tag: text-generation
13
+ ---
14
+
15
+ <div align="center">
16
+ <span style="font-family: default; font-size: 1.5em;">DeepCoder-1.5B-Preview</span>
17
+ <div>
18
+ 🚀 Democratizing Reinforcement Learning for LLMs (RLLM) 🌟
19
+ </div>
20
+ </div>
21
+ <br>
22
+ <div align="center" style="line-height: 1;">
23
+ <a href="https://github.com/agentica-project/rllm" style="margin: 2px;">
24
+ <img alt="Code" src="https://img.shields.io/badge/RLLM-000000?style=for-the-badge&logo=github&logoColor=000&logoColor=white" style="display: inline-block; vertical-align: middle;"/>
25
+ </a>
26
+ <a href="https://pretty-radio-b75.notion.site/DeepCoder-A-Fully-Open-Source-14B-Coder-at-O3-mini-Level-1cf81902c14680b3bee5eb349a512a51" target="_blank" style="margin: 2px;">
27
+ <img alt="Blog" src="https://img.shields.io/badge/Notion-%23000000.svg?style=for-the-badge&logo=notion&logoColor=white" style="display: inline-block; vertical-align: middle;"/>
28
+ </a>
29
+ <a href="https://x.com/Agentica_" style="margin: 2px;">
30
+ <img alt="X.ai" src="https://img.shields.io/badge/Agentica-white?style=for-the-badge&logo=X&logoColor=000&color=000&labelColor=white" style="display: inline-block; vertical-align: middle;"/>
31
+ </a>
32
+ <a href="https://huggingface.co/agentica-org" style="margin: 2px;">
33
+ <img alt="Hugging Face" src="https://img.shields.io/badge/Agentica-fcd022?style=for-the-badge&logo=huggingface&logoColor=000&labelColor" style="display: inline-block; vertical-align: middle;"/>
34
+ </a>
35
+ </div>
36
+ </div>
37
+ </div>
38
+
39
+ ## DeepCoder Overview
40
+ DeepCoder-1.5B-Preview is a code reasoning LLM fine-tuned from DeepSeek-R1-Distilled-Qwen-1.5B using distributed reinforcement learning (RL) to scale up to long context lengths.
41
+
42
+ ## Data
43
+ Our training dataset consists of approximately 24K unique problem-tests pairs compiled from:
44
+ - Taco-Verified
45
+ - PrimeIntellect SYNTHETIC-1
46
+ - LiveCodeBench v5 (5/1/23-7/31/24)
47
+
48
+ ## Training Recipe
49
+
50
+ Our training recipe relies on an improved version of GRPO (GRPO+) and iterative context lengthening, introduced in DeepScaleR.
51
+
52
+ ### GRPO+
53
+
54
+ We enhance the original GRPO algorithm with insights from DAPO to enable more stable training:
55
+
56
+ - **Offline Difficulty Filtering:** DAPO employs online dynamic sampling, discarding both entirely correct and entirely incorrect samples on the fly. While this helps maintain a more stable effective batch size, it introduces significant runtime overhead due to rejection sampling. Instead, we perform offline difficulty filtering on a subset of coding problems to ensure the training dataset remains within a suitable difficulty range.
57
+ - **No Entropy Loss:** We observed that including an entropy loss term often led to instability, with entropy growing exponentially and ultimately collapsing training. To mitigate this, we eliminate the entropy loss entirely.
58
+ - **No KL Loss:** Eliminating KL loss prevents the LLM from staying within trust region of the original SFT model. This removal also obviates the need to compute log probabilities for the reference policy, thereby accelerating training.
59
+ - **Overlong Filtering** **(from DAPO):** To preserve long-context reasoning, we mask the loss for truncated sequences. This technique enables DeepCoder to generalize to 64K-context inference despite being trained with a 32K context.
60
+ - **Clip High (from DAPO):** By increasing the upper bound in GRPO/PPO’s surrogate loss, we encourage more exploration and more stable entropy.
61
+
62
+ ### Iterative Context Lengthening
63
+
64
+ Our original `Deepscaler-1.5B-Preview` scaled long context training from 8K→16K→24K, achieving 33→38→43% on AIME respectively. Similarly, `Deepcoder-14B-Preview` is trained on 16K→32K, achieving 54→58% on LiveCodeBench (v5). `DeepCoder-14B-Preview` successfully generalizes to longer contexts when evaluated at 64K context, reaching 60.6%.
65
+
66
+ DeepCoder generalizes better to long contexts than the base distilled model, due to DAPO's overlong filtering. However, it's longer responses are often truncated when the max length is capped at 16K, which can lower its scores.
67
+ | **Model** | **16K** | **32K** | **64K** |
68
+ | --- | --- | --- | --- |
69
+ | **DeepCoder-14B-Preview** | 45.6 | 57.9 | 60.6 |
70
+ | **DeepSeek-R1-Distill-Qwen-14B** | 50.2 | 53.0 | 53.0 |
71
+
72
+ A more detailed description of the training recipe can be found in our [blog post](https://pretty-radio-b75.notion.site/DeepCoder-A-Fully-Open-Source-14B-Coder-at-O3-mini-Level-1cf81902c14680b3bee5eb349a512a51).
73
+
74
+ ## Evaluation
75
+
76
+ We evaluate `Deepcoder-1.5B-Preview` on various coding benchmarks, including LiveCodeBench (LCBv5), Codeforces, and HumanEval+.
77
+
78
+ | **Model** | LCB (v5)(8/1/24-2/1/25) | Codeforces Rating | Codeforces Percentile | HumanEval+ |
79
+ | --- | --- | --- | --- | --- |
80
+ | **DeepCoder-1.5B-Preview** | **25.1** | **963** | **28.5** | **73.0** |
81
+ | **Deepseek-R1-Distill-Qwen-1.5B** | 16.9 | 615 | 1.9 | 58.3 |
82
+
83
+ ## Serving DeepCoder
84
+ Our model can be served using popular high-performance inference systems:
85
+ - vLLM
86
+ - Hugging Face Text Generation Inference (TGI)
87
+ - SGLang
88
+ - TensorRT-LLM
89
+
90
+ All these systems support the OpenAI Chat Completions API format.
91
+
92
+ ## License
93
+ This project is released under the MIT License, reflecting our commitment to open and accessible AI development.
94
+ We believe in democratizing AI technology by making our work freely available for anyone to use, modify, and build upon.
95
+ This permissive license ensures that researchers, developers, and enthusiasts worldwide can leverage and extend our work without restrictions, fostering innovation and collaboration in the AI community.
96
+
97
+ ## Acknowledgement
98
+ - Our training experiments are powered by our heavily modified fork of [Verl](https://github.com/agentica-project/verl), an open-source post-training library.
99
+ - Notably, we train 1.5B with [verl pipeline](https://github.com/agentica-project/verl-pipeline), an extension of the original verl.
100
+ - Our model is trained on top of [`DeepSeek-R1-Distill-Qwen-1.5B`](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B).
101
+ - Our work is done as part of [Berkeley Sky Computing Lab](https://skycomputing.berkeley.edu/) and [Berkeley AI Research](https://bair.berkeley.edu/).
102
+
103
+ ## Citation
104
+
105
+ ```bibtex
106
+ @misc{deepcoder2025,
107
+ title={DeepCoder: A Fully Open-Source 14B Coder at O3-mini Level},
108
+ author={Michael Luo, Sijun Tan, Roy Huang, Ameen Patel, Alpay Ariyak, Qingyang Wu, Xiaoxiang Shi, Rachel Xin, Colin Cai, Maurice Weber, Ce Zhang, Li Erran Li, Raluca Ada Popa, Ion Stoica},
109
+ howpublished={\url{https://pretty-radio-b75.notion.site/DeepCoder-A-Fully-Open-Source-14B-Coder-at-O3-mini-Level-1cf81902c14680b3bee5eb349a512a51}},
110
+ note={Notion Blog},
111
+ year={2025}
112
+ }