michaelfeil
/

ct2fast-nllb-200-distilled-1.3B

text2text-generation

Model card Files Files and versions Community

michaelfeil commited on Dec 9, 2023

Commit

22233c6

•

1 Parent(s): d9eb9e6

Update README.md

Files changed (1) hide show

README.md +0 -28

README.md CHANGED Viewed

@@ -218,34 +218,6 @@ inference: false
 Speedup inference while reducing memory by 2x-4x using int8 inference in C++ on CPU or GPU.
 quantized version of [facebook/nllb-200-distilled-1.3B](https://huggingface.co/facebook/nllb-200-distilled-1.3B)
-```bash
-pip install hf-hub-ctranslate2>=2.12.0 ctranslate2>=3.16.0
-```
-```python
-# from transformers import AutoTokenizer
-model_name = "michaelfeil/ct2fast-nllb-200-distilled-1.3B"
-from hf_hub_ctranslate2 import TranslatorCT2fromHfHub
-model = TranslatorCT2fromHfHub(
-        # load in int8 on CUDA
-        model_name_or_path=model_name,
-        device="cuda",
-        compute_type="int8_float16",
-        # tokenizer=AutoTokenizer.from_pretrained("{ORG}/{NAME}")
-)
-outputs = model.generate(
-    text=["def fibonnaci(", "User: How are you doing? Bot:"],
-    max_length=64,
-)
-print(outputs)
-```
-Checkpoint compatible to [ctranslate2>=3.16.0](https://github.com/OpenNMT/CTranslate2)
-and [hf-hub-ctranslate2>=2.12.0](https://github.com/michaelfeil/hf-hub-ctranslate2)
-- `compute_type=int8_float16` for `device="cuda"`
-- `compute_type=int8`  for `device="cpu"`
 Converted on 2023-06-23 using
 ```

 Speedup inference while reducing memory by 2x-4x using int8 inference in C++ on CPU or GPU.
 quantized version of [facebook/nllb-200-distilled-1.3B](https://huggingface.co/facebook/nllb-200-distilled-1.3B)
 Converted on 2023-06-23 using
 ```