alamios commited on
Commit
9eea5a1
·
verified ·
1 Parent(s): 4235257

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +29 -29
README.md CHANGED
@@ -1,30 +1,30 @@
1
- ---
2
- license: apache-2.0
3
- language:
4
- - en
5
- base_model:
6
- - Qwen/Qwen2.5-Coder-0.5B
7
- pipeline_tag: text-generation
8
- library_name: transformers
9
- tags:
10
- - code
11
- - qwen
12
- - qwen2.5
13
- - qwen-coder
14
- - codeqwen
15
- - deepseek
16
- ---
17
-
18
- # DeepSeek-R1-DRAFT-Qwen2.5-Coder-0.5B
19
-
20
- **Updated**
21
-
22
- This model is trained on CODE outputs of <a href="https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-32B">deepseek-ai/DeepSeek-R1-Distill-Qwen-32B</a> and is meant to be used only as draft model for speculative decoding.
23
-
24
- It's specifically intended for users of 3090/4090, allowing you to run the DeepSeek-R1-Distill-Qwen-32B-Q4_K_M GGUF version with 16k context and speeding up generation without sacrificing more context length or model quality.
25
-
26
- # Data info
27
-
28
- The data consists of code tasks collected from various datasets. It has been trained for 2 epochs on 2.5k unique examples, for a total of 7.6 million tokens per epoch.
29
-
30
  Since data generation was done using spare GPU time, I may publish a further trained version later.
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - en
5
+ base_model:
6
+ - Qwen/Qwen2.5-Coder-0.5B
7
+ pipeline_tag: text-generation
8
+ library_name: transformers
9
+ tags:
10
+ - code
11
+ - qwen
12
+ - qwen2.5
13
+ - qwen-coder
14
+ - codeqwen
15
+ - deepseek
16
+ ---
17
+
18
+ # DeepSeek-R1-DRAFT-Qwen2.5-Coder-0.5B
19
+
20
+ **Updated to v1**
21
+
22
+ This model is trained on CODE outputs of <a href="https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-32B">deepseek-ai/DeepSeek-R1-Distill-Qwen-32B</a> and is meant to be used only as draft model for speculative decoding.
23
+
24
+ It's specifically intended for users of 3090/4090, allowing you to run the DeepSeek-R1-Distill-Qwen-32B-Q4_K_M GGUF version with 16k context and speeding up generation without sacrificing more context length or model quality.
25
+
26
+ # Data info
27
+
28
+ The data consists of code tasks collected from various datasets. It has been trained for 2 epochs on 2.5k unique examples, for a total of 7.6 million tokens per epoch.
29
+
30
  Since data generation was done using spare GPU time, I may publish a further trained version later.