Deepak7376 commited on
Commit
64f4083
·
verified ·
1 Parent(s): 9e2c3dd

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +85 -3
README.md CHANGED
@@ -1,3 +1,85 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ licence: mit
3
+ tags:
4
+ - text-generation
5
+ - quantized
6
+ - bitsandbytes
7
+ - deepseek
8
+ - 4bit
9
+ ---
10
+
11
+ # Quantized DeepSeek-R1-Distill-Qwen-1.5B
12
+
13
+ ![Model Preview](https://github.com/deepseek-ai/DeepSeek-V2/blob/main/figures/logo.svg?raw=true)
14
+
15
+ This is a **4-bit quantized version** of the [DeepSeek-R1-Distill-Qwen-1.5B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B) model using `bitsandbytes` quantization.
16
+
17
+ ## Model Details
18
+ - **Base Model:** `deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B`
19
+ - **Quantization:** 4-bit (`NF4`)
20
+ - **Library:** [bitsandbytes](https://github.com/TimDettmers/bitsandbytes)
21
+ - **Framework:** `transformers`
22
+ - **Use Case:** Text generation, chatbot applications, and other NLP tasks.
23
+
24
+ ## How to Load the Model
25
+
26
+ ```python
27
+ from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
28
+
29
+ model_id = "Deepak7376/DeepSeek-R1-Distill-Qwen-1.5B-bnb-4bit"
30
+
31
+ bnb_config_4bit = BitsAndBytesConfig(
32
+ load_in_4bit=True,
33
+ bnb_4bit_quant_type="nf4",
34
+ bnb_4bit_compute_dtype=torch.float16,
35
+ bnb_4bit_use_double_quant=True,
36
+ )
37
+
38
+ model = AutoModelForCausalLM.from_pretrained(model_id, quantization_config=bnb_config_4bit)
39
+ tokenizer = AutoTokenizer.from_pretrained(model_id)
40
+
41
+ pipe = pipeline(
42
+ 'text-generation',
43
+ model=model,
44
+ tokenizer=tokenizer,
45
+ max_length=1024,
46
+ truncation=True,
47
+ do_sample=True,
48
+ temperature=0.6,
49
+ top_p=0.95,
50
+ )
51
+
52
+ messages = [
53
+ {"role": "user", "content": "suggest me top movies in 2021? <think>\n"},
54
+ ]
55
+ pipe(messages)
56
+
57
+ ```
58
+ or
59
+
60
+ ```python
61
+
62
+ from transformers import pipeline
63
+
64
+ pipe = pipeline("text-generation", model="Deepak7376/DeepSeek-R1-Distill-Qwen-1.5B-bnb-4bit")
65
+
66
+ messages = [
67
+ {"role": "user", "content": "suggest me top movies in 2021? <think>\n"},
68
+ ]
69
+ pipe(messages)
70
+ ```
71
+
72
+ ## Model Performance
73
+ Quantizing the model significantly reduces memory usage while maintaining good performance. Here are the memory footprints:
74
+
75
+ | Model Version | Memory Usage |
76
+ |--------------|-------------|
77
+ | Base Model | ~3.5GB |
78
+ | 4-bit Quantized | ~1.5GB |
79
+
80
+ ## License
81
+ This model follows the `apache-2.0` license.
82
+
83
+ ## Acknowledgments
84
+ - [DeepSeek-AI](https://huggingface.co/deepseek-ai) for the original model.
85
+ - [BitsAndBytes](https://github.com/TimDettmers/bitsandbytes) for quantization support.