Chat2Find commited on
Commit
f97feb5
·
verified ·
1 Parent(s): 31adec0

Added explicit CPU execution instructions and code example

Browse files
Files changed (1) hide show
  1. README.md +26 -1
README.md CHANGED
@@ -81,7 +81,7 @@ response = tokenizer.decode(outputs[0], skip_special_tokens=True)
81
  print(response)
82
  ```
83
 
84
- ### Using Standard Transformers
85
 
86
  ```python
87
  from transformers import AutoModelForCausalLM, AutoTokenizer
@@ -94,6 +94,31 @@ model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto")
94
  # You can load it in 4-bit/8-bit using BitsAndBytes.
95
  ```
96
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
97
  ## Limitations & Bias
98
 
99
  While Chat2Find-CPT is significantly better at local languages than the base Qwen model, it may still exhibit biases present in the training data or the base model's internal knowledge. Users are encouraged to perform their own safety checks for specific deployment scenarios.
 
81
  print(response)
82
  ```
83
 
84
+ ### Using Standard Transformers (GPU)
85
 
86
  ```python
87
  from transformers import AutoModelForCausalLM, AutoTokenizer
 
94
  # You can load it in 4-bit/8-bit using BitsAndBytes.
95
  ```
96
 
97
+ ### Running on CPU Only
98
+
99
+ If you do not have a dedicated GPU, you can explicitly map the model to CPU. Note that inference will be significantly slower.
100
+
101
+ ```python
102
+ from transformers import AutoModelForCausalLM, AutoTokenizer
103
+
104
+ model_name = "Chat2Find/Chat2Find-CPT"
105
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
106
+
107
+ # Force the model to load into CPU RAM
108
+ model = AutoModelForCausalLM.from_pretrained(
109
+ model_name,
110
+ device_map="cpu",
111
+ torch_dtype="auto" # Loads in bfloat16 to save RAM
112
+ )
113
+
114
+ prompt = "ශ්‍රී ලංකාව ගැන කෙටි විස්තරයක්:"
115
+ inputs = tokenizer(text=[prompt], return_tensors="pt").to("cpu")
116
+
117
+ outputs = model.generate(**inputs, max_new_tokens=128)
118
+ response = tokenizer.decode(outputs[0], skip_special_tokens=True)
119
+ print(response)
120
+ ```
121
+
122
  ## Limitations & Bias
123
 
124
  While Chat2Find-CPT is significantly better at local languages than the base Qwen model, it may still exhibit biases present in the training data or the base model's internal knowledge. Users are encouraged to perform their own safety checks for specific deployment scenarios.