michaelfeil commited on
Commit
22233c6
1 Parent(s): d9eb9e6

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +0 -28
README.md CHANGED
@@ -218,34 +218,6 @@ inference: false
218
  Speedup inference while reducing memory by 2x-4x using int8 inference in C++ on CPU or GPU.
219
 
220
  quantized version of [facebook/nllb-200-distilled-1.3B](https://huggingface.co/facebook/nllb-200-distilled-1.3B)
221
- ```bash
222
- pip install hf-hub-ctranslate2>=2.12.0 ctranslate2>=3.16.0
223
- ```
224
-
225
- ```python
226
- # from transformers import AutoTokenizer
227
- model_name = "michaelfeil/ct2fast-nllb-200-distilled-1.3B"
228
-
229
-
230
- from hf_hub_ctranslate2 import TranslatorCT2fromHfHub
231
- model = TranslatorCT2fromHfHub(
232
- # load in int8 on CUDA
233
- model_name_or_path=model_name,
234
- device="cuda",
235
- compute_type="int8_float16",
236
- # tokenizer=AutoTokenizer.from_pretrained("{ORG}/{NAME}")
237
- )
238
- outputs = model.generate(
239
- text=["def fibonnaci(", "User: How are you doing? Bot:"],
240
- max_length=64,
241
- )
242
- print(outputs)
243
- ```
244
-
245
- Checkpoint compatible to [ctranslate2>=3.16.0](https://github.com/OpenNMT/CTranslate2)
246
- and [hf-hub-ctranslate2>=2.12.0](https://github.com/michaelfeil/hf-hub-ctranslate2)
247
- - `compute_type=int8_float16` for `device="cuda"`
248
- - `compute_type=int8` for `device="cpu"`
249
 
250
  Converted on 2023-06-23 using
251
  ```
 
218
  Speedup inference while reducing memory by 2x-4x using int8 inference in C++ on CPU or GPU.
219
 
220
  quantized version of [facebook/nllb-200-distilled-1.3B](https://huggingface.co/facebook/nllb-200-distilled-1.3B)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
221
 
222
  Converted on 2023-06-23 using
223
  ```