cifope
/

nllb-200-wo-fr-distilled-600M

 ---
 license: mit
+language:
+- wo
+- fr
+metrics:
+- bleu
+pipeline_tag: translation
+tags:
+- text-generation-inference
 ---
+# Model Documentation: Wolof to French Translation with NLLB-200
+## Model Overview
+This document describes a machine translation model fine-tuned from Meta's NLLB-200 for translating from Wolof to French. The model, hosted at `cifope/nllb-200-wo-fr-distilled-600M`, utilizes a distilled version of the NLLB-200 model which has been specifically optimized for translation tasks between the Wolof and French languages.
+## Dependencies
+The model requires the `transformers` library by Hugging Face. Ensure that you have the library installed:
+```bash
+pip install transformers
+```
+## Setup
+Import necessary classes from the `transformers` library:
+```python
+from transformers import AutoModelForSeq2SeqLM, NllbTokenizer
+```
+Initialize the model and tokenizer:
+```python
+model = AutoModelForSeq2SeqLM.from_pretrained('cifope/nllb-200-wo-fr-distilled-600M')
+tokenizer = NllbTokenizer.from_pretrained('facebook/nllb-200-distilled-600M')
+```
+## Tokenizer Customization
+To integrate specific features like new language codes into the tokenizer, you can use the `fix_tokenizer` function:
+```python
+def fix_tokenizer(tokenizer, new_lang='wol_Wol'):
+    old_len = len(tokenizer) - int(new_lang in tokenizer.added_tokens_encoder)
+    tokenizer.lang_code_to_id[new_lang] = old_len-1
+    tokenizer.id_to_lang_code[old_len-1] = new_lang
+    tokenizer.fairseq_tokens_to_ids["<mask>"] = len(tokenizer.sp_model) + len(tokenizer.lang_code_to_id) + tokenizer.fairseq_offset
+    tokenizer.fairseq_tokens_to_ids.update(tokenizer.lang_code_to_id)
+    tokenizer.fairseq_ids_to_tokens = {v: k for k, v in tokenizer.fairseq_tokens_to_ids.items()}
+    if new_lang not in tokenizer._additional_special_tokens:
+        tokenizer._additional_special_tokens.append(new_lang)
+    tokenizer.added_tokens_encoder = {}
+    tokenizer.added_tokens_decoder = {}
+fix_tokenizer(tokenizer)
+```
+## Translation Functions
+### Translate from French to Wolof
+The `translate` function translates text from French to Wolof:
+```python
+def translate(text, src_lang='fra_Latn', tgt_lang='wol_Wol', a=16, b=1.5, max_input_length=1024, **kwargs):
+    tokenizer.src_lang = src_lang
+    tokenizer.tgt_lang = tgt_lang
+    inputs = tokenizer(text, return_tensors='pt', padding=True, truncation=True, max_length=max_input_length)
+    result = model.generate(
+        **inputs.to(model.device),
+        forced_bos_token_id=tokenizer.convert_tokens_to_ids(tgt_lang),
+        max_new_tokens=int(a + b * inputs.input_ids.shape[1]),
+        **kwargs
+    )
+    return tokenizer.batch_decode(result, skip_special_tokens=True)
+```
+### Translate from Wolof to French
+The `reversed_translate` function translates text from Wolof to French:
+```python
+def reversed_translate(text, src_lang='wol_Wol', tgt_lang='fra_Latn', a=16, b=1.5, max_input_length=1024, **kwargs):
+    tokenizer.src_lang = src_lang
+    tokenizer.tgt_lang = tgt_lang
+    inputs = tokenizer(text, return_tensors='pt', padding=True, truncation=True, max_length=max_input_length)
+    result = model.generate(
+        **inputs.to(model.device),
+        forced_bos_token_id=tokenizer.convert_tokens_to_ids(tgt_lang),
+        max_new_tokens=int(a + b * inputs.input_ids.shape[1]),
+        **kwargs
+    )
+    return tokenizer.batch_decode(result, skip_special_tokens=True)
+```
+## Usage
+To use the model for translating text, simply call the `translate` or `reversed_translate` function with the appropriate text and parameters. For example:
+```python
+french_text = "L'argent peut être échangé à la seule banque des îles située à Stanley"
+wolof_translation = translate(french_text)
+print(wolof_translation)
+wolof_text = "alkaati yi tàmbali nañu xàll léegi kilifa gi ñów"
+french_translation = reversed_translate(wolof_text)
+print(french_translation)
+```