Deepak7376
/

DeepSeek-R1-Distill-Qwen-1.5B-bnb-4bit

Text Generation

4-bit precision

Model card Files Files and versions Community

Deepak7376 commited on 4 days ago

Commit

64f4083

·

verified ·

1 Parent(s): 9e2c3dd

Update README.md

Files changed (1) hide show

README.md +85 -3

README.md CHANGED Viewed

@@ -1,3 +1,85 @@
----
-license: mit
----

+---
+licence: mit
+tags:
+- text-generation
+- quantized
+- bitsandbytes
+- deepseek
+- 4bit
+---
+# Quantized DeepSeek-R1-Distill-Qwen-1.5B
+![Model Preview](https://github.com/deepseek-ai/DeepSeek-V2/blob/main/figures/logo.svg?raw=true)
+This is a **4-bit quantized version** of the [DeepSeek-R1-Distill-Qwen-1.5B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B) model using `bitsandbytes` quantization.
+## Model Details
+- **Base Model:** `deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B`
+- **Quantization:** 4-bit (`NF4`)
+- **Library:** [bitsandbytes](https://github.com/TimDettmers/bitsandbytes)
+- **Framework:** `transformers`
+- **Use Case:** Text generation, chatbot applications, and other NLP tasks.
+## How to Load the Model
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
+model_id = "Deepak7376/DeepSeek-R1-Distill-Qwen-1.5B-bnb-4bit"
+bnb_config_4bit = BitsAndBytesConfig(
+    load_in_4bit=True,
+    bnb_4bit_quant_type="nf4",
+    bnb_4bit_compute_dtype=torch.float16,
+    bnb_4bit_use_double_quant=True,
+)
+model = AutoModelForCausalLM.from_pretrained(model_id, quantization_config=bnb_config_4bit)
+tokenizer = AutoTokenizer.from_pretrained(model_id)
+pipe = pipeline(
+        'text-generation',
+        model=model,
+        tokenizer=tokenizer,
+        max_length=1024,
+        truncation=True,
+        do_sample=True,
+        temperature=0.6,
+        top_p=0.95,
+    )
+messages = [
+    {"role": "user", "content": "suggest me top movies in 2021? <think>\n"},
+]
+pipe(messages)
+```
+or
+```python
+from transformers import pipeline
+pipe = pipeline("text-generation", model="Deepak7376/DeepSeek-R1-Distill-Qwen-1.5B-bnb-4bit")
+messages = [
+    {"role": "user", "content": "suggest me top movies in 2021? <think>\n"},
+]
+pipe(messages)
+```
+## Model Performance
+Quantizing the model significantly reduces memory usage while maintaining good performance. Here are the memory footprints:
+| Model Version | Memory Usage |
+|--------------|-------------|
+| Base Model | ~3.5GB |
+| 4-bit Quantized | ~1.5GB |
+## License
+This model follows the `apache-2.0` license.
+## Acknowledgments
+- [DeepSeek-AI](https://huggingface.co/deepseek-ai) for the original model.
+- [BitsAndBytes](https://github.com/TimDettmers/bitsandbytes) for quantization support.