NLLB-200-3.3B β CTranslate2 int8
facebook/nllb-200-3.3B converted to CTranslate2 with int8 quantization
for fast CPU inference.
Contents
model.binβ quantized weights (~3.3 GB)config.jsonβ CT2 model configshared_vocabulary.jsonβ shared NLLB vocabularytokenizer.jsonβ fast tokenizer
Usage
import ctranslate2
from tokenizers import Tokenizer
from huggingface_hub import snapshot_download
model_dir = snapshot_download("Napron/nllb-200-3.3B-ct2-int8")
translator = ctranslate2.Translator(model_dir, device="cpu", compute_type="int8")
tokenizer = Tokenizer.from_file(f"{model_dir}/tokenizer.json")
src_lang, tgt_lang = "eng_Latn", "fra_Latn"
text = "Hello, how are you?"
source_tokens = tokenizer.encode(f"{src_lang} {text}").tokens
result = translator.translate_batch(
[source_tokens],
target_prefix=[[tgt_lang]],
beam_size=4,
)
out_tokens = result[0].hypotheses[0][1:] # drop the tgt_lang prefix token
print(tokenizer.decode(tokenizer.token_to_id_batch(out_tokens) if False else
[tokenizer.token_to_id(t) for t in out_tokens],
skip_special_tokens=True))
License & attribution
Derivative of facebook/nllb-200-3.3B. Original model and this conversion are licensed under CC-BY-NC 4.0. Non-commercial use only.
- Downloads last month
- 3
Model tree for Napron/nllb-200-3.3B-ct2-int8
Base model
facebook/nllb-200-3.3B