NLLB-200-3.3B β€” CTranslate2 int8

facebook/nllb-200-3.3B converted to CTranslate2 with int8 quantization for fast CPU inference.

Contents

  • model.bin β€” quantized weights (~3.3 GB)
  • config.json β€” CT2 model config
  • shared_vocabulary.json β€” shared NLLB vocabulary
  • tokenizer.json β€” fast tokenizer

Usage

import ctranslate2
from tokenizers import Tokenizer
from huggingface_hub import snapshot_download

model_dir = snapshot_download("Napron/nllb-200-3.3B-ct2-int8")
translator = ctranslate2.Translator(model_dir, device="cpu", compute_type="int8")
tokenizer = Tokenizer.from_file(f"{model_dir}/tokenizer.json")

src_lang, tgt_lang = "eng_Latn", "fra_Latn"
text = "Hello, how are you?"
source_tokens = tokenizer.encode(f"{src_lang} {text}").tokens
result = translator.translate_batch(
    [source_tokens],
    target_prefix=[[tgt_lang]],
    beam_size=4,
)
out_tokens = result[0].hypotheses[0][1:]  # drop the tgt_lang prefix token
print(tokenizer.decode(tokenizer.token_to_id_batch(out_tokens) if False else
                        [tokenizer.token_to_id(t) for t in out_tokens],
                        skip_special_tokens=True))

License & attribution

Derivative of facebook/nllb-200-3.3B. Original model and this conversion are licensed under CC-BY-NC 4.0. Non-commercial use only.

Downloads last month
3
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for Napron/nllb-200-3.3B-ct2-int8

Finetuned
(35)
this model

Space using Napron/nllb-200-3.3B-ct2-int8 1