Added explicit CPU execution instructions and code example
Browse files
README.md
CHANGED
|
@@ -81,7 +81,7 @@ response = tokenizer.decode(outputs[0], skip_special_tokens=True)
|
|
| 81 |
print(response)
|
| 82 |
```
|
| 83 |
|
| 84 |
-
### Using Standard Transformers
|
| 85 |
|
| 86 |
```python
|
| 87 |
from transformers import AutoModelForCausalLM, AutoTokenizer
|
|
@@ -94,6 +94,31 @@ model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto")
|
|
| 94 |
# You can load it in 4-bit/8-bit using BitsAndBytes.
|
| 95 |
```
|
| 96 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 97 |
## Limitations & Bias
|
| 98 |
|
| 99 |
While Chat2Find-CPT is significantly better at local languages than the base Qwen model, it may still exhibit biases present in the training data or the base model's internal knowledge. Users are encouraged to perform their own safety checks for specific deployment scenarios.
|
|
|
|
| 81 |
print(response)
|
| 82 |
```
|
| 83 |
|
| 84 |
+
### Using Standard Transformers (GPU)
|
| 85 |
|
| 86 |
```python
|
| 87 |
from transformers import AutoModelForCausalLM, AutoTokenizer
|
|
|
|
| 94 |
# You can load it in 4-bit/8-bit using BitsAndBytes.
|
| 95 |
```
|
| 96 |
|
| 97 |
+
### Running on CPU Only
|
| 98 |
+
|
| 99 |
+
If you do not have a dedicated GPU, you can explicitly map the model to CPU. Note that inference will be significantly slower.
|
| 100 |
+
|
| 101 |
+
```python
|
| 102 |
+
from transformers import AutoModelForCausalLM, AutoTokenizer
|
| 103 |
+
|
| 104 |
+
model_name = "Chat2Find/Chat2Find-CPT"
|
| 105 |
+
tokenizer = AutoTokenizer.from_pretrained(model_name)
|
| 106 |
+
|
| 107 |
+
# Force the model to load into CPU RAM
|
| 108 |
+
model = AutoModelForCausalLM.from_pretrained(
|
| 109 |
+
model_name,
|
| 110 |
+
device_map="cpu",
|
| 111 |
+
torch_dtype="auto" # Loads in bfloat16 to save RAM
|
| 112 |
+
)
|
| 113 |
+
|
| 114 |
+
prompt = "ශ්රී ලංකාව ගැන කෙටි විස්තරයක්:"
|
| 115 |
+
inputs = tokenizer(text=[prompt], return_tensors="pt").to("cpu")
|
| 116 |
+
|
| 117 |
+
outputs = model.generate(**inputs, max_new_tokens=128)
|
| 118 |
+
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
|
| 119 |
+
print(response)
|
| 120 |
+
```
|
| 121 |
+
|
| 122 |
## Limitations & Bias
|
| 123 |
|
| 124 |
While Chat2Find-CPT is significantly better at local languages than the base Qwen model, it may still exhibit biases present in the training data or the base model's internal knowledge. Users are encouraged to perform their own safety checks for specific deployment scenarios.
|