Chat2Find
/

Chat2Find-CPT

@@ -81,7 +81,7 @@ response = tokenizer.decode(outputs[0], skip_special_tokens=True)
 print(response)
 ```
-### Using Standard Transformers
 ```python
 from transformers import AutoModelForCausalLM, AutoTokenizer
@@ -94,6 +94,31 @@ model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto")
 # You can load it in 4-bit/8-bit using BitsAndBytes.
 ```
 ## Limitations & Bias
 While Chat2Find-CPT is significantly better at local languages than the base Qwen model, it may still exhibit biases present in the training data or the base model's internal knowledge. Users are encouraged to perform their own safety checks for specific deployment scenarios.

 print(response)
 ```
+### Using Standard Transformers (GPU)
 ```python
 from transformers import AutoModelForCausalLM, AutoTokenizer
 # You can load it in 4-bit/8-bit using BitsAndBytes.
 ```
+### Running on CPU Only
+If you do not have a dedicated GPU, you can explicitly map the model to CPU. Note that inference will be significantly slower.
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+model_name = "Chat2Find/Chat2Find-CPT"
+tokenizer = AutoTokenizer.from_pretrained(model_name)
+# Force the model to load into CPU RAM
+model = AutoModelForCausalLM.from_pretrained(
+    model_name,
+    device_map="cpu",
+    torch_dtype="auto" # Loads in bfloat16 to save RAM
+)
+prompt = "ශ්‍රී ලංකාව ගැන කෙටි විස්තරයක්:"
+inputs = tokenizer(text=[prompt], return_tensors="pt").to("cpu")
+outputs = model.generate(**inputs, max_new_tokens=128)
+response = tokenizer.decode(outputs[0], skip_special_tokens=True)
+print(response)
+```
 ## Limitations & Bias
 While Chat2Find-CPT is significantly better at local languages than the base Qwen model, it may still exhibit biases present in the training data or the base model's internal knowledge. Users are encouraged to perform their own safety checks for specific deployment scenarios.