NLLB-200-3.3B — CTranslate2 int8

facebook/nllb-200-3.3B converted to CTranslate2 with int8 quantization for fast CPU inference.

model.bin — quantized weights (~3.3 GB)
config.json — CT2 model config
shared_vocabulary.json — shared NLLB vocabulary
tokenizer.json — fast tokenizer

Usage

import ctranslate2
from tokenizers import Tokenizer
from huggingface_hub import snapshot_download

model_dir = snapshot_download("Napron/nllb-200-3.3B-ct2-int8")
translator = ctranslate2.Translator(model_dir, device="cpu", compute_type="int8")
tokenizer = Tokenizer.from_file(f"{model_dir}/tokenizer.json")

src_lang, tgt_lang = "eng_Latn", "fra_Latn"
text = "Hello, how are you?"
source_tokens = tokenizer.encode(f"{src_lang} {text}").tokens
result = translator.translate_batch(
    [source_tokens],
    target_prefix=[[tgt_lang]],
    beam_size=4,
)
out_tokens = result[0].hypotheses[0][1:]  # drop the tgt_lang prefix token
print(tokenizer.decode(tokenizer.token_to_id_batch(out_tokens) if False else
                        [tokenizer.token_to_id(t) for t in out_tokens],
                        skip_special_tokens=True))

License & attribution

Derivative of facebook/nllb-200-3.3B. Original model and this conversion are licensed under CC-BY-NC 4.0. Non-commercial use only.

Downloads last month: 3

Model tree for Napron/nllb-200-3.3B-ct2-int8

Base model

facebook/nllb-200-3.3B

Finetuned

(35)

this model

Napron
/

nllb-200-3.3B-ct2-int8

NLLB-200-3.3B — CTranslate2 int8

Contents

Usage

License & attribution

Model tree for Napron/nllb-200-3.3B-ct2-int8

Space using Napron/nllb-200-3.3B-ct2-int8 1