---
license: mit
datasets:
- ahmedheakl/arzen-llm-dataset
language:
- ar
- en
metrics:
- bleu
- ecody726/bertscore
- meteor
library_name: transformers
pipeline_tag: translation
---

## How to use
Just install `peft`, `transformers`, 'accelerate', 'bitsandbytes' and `pytorch` first.

```bash
pip install peft accelerate bitsandbytes transformers torch
```

Then login with your huggingface token to get access to base models
```bash
huggingface-cli login --token <YOUR_HF_TOKEN>
```

Then load the model.
```python
from peft import PeftConfig, PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

peft_model_id = "ahmedheakl/arazn-llama3-english"
peft_config = PeftConfig.from_pretrained(peft_model_id)
base_model_name = peft_config.base_model_name_or_path
base_model = AutoModelForCausalLM.from_pretrained(base_model_name, device_map="auto", torch_dtype=torch.bfloat16)
model = PeftModel.from_pretrained(base_model, peft_model_id, device_map="auto")
tokenizer = AutoTokenizer.from_pretrained(peft_model_id)
```

Then do inference
```python
import torch

raw_prompt = """<|begin_of_text|><|start_header_id|>system<|end_header_id|>

Translate the following code-switched Arabic-English-mixed text to English only.<|eot_id|><|start_header_id|>user<|end_header_id|>

{source}<|eot_id|><|start_header_id|>assistant<|end_header_id|>

"""
def inference(prompt) -> str:
    prompt = raw_prompt.format(source=prompt)
    inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
    generated_ids = model.generate(
        **inputs,
        use_cache=True,
        num_return_sequences=1,
        max_new_tokens=100,
        # do_sample=True,
        num_beams=1,
      #  temperature=0.7,
        eos_token_id=tokenizer.eos_token_id,
        pad_token_id=tokenizer.pad_token_id,
    )
    outputs = tokenizer.batch_decode(generated_ids)[0]
    torch.cuda.empty_cache()
    torch.cuda.synchronize()
    return outputs.split("assistant<|end_header_id|>\n\n")[-1].split("<|eot_id|>")[0]
print(inference("أنا أحب الbanana")) # I love bananas
```

**Please see paper & code for more information:**
- https://github.com/ahmedheakl/arazn-llm
- https://arxiv.org/abs/2406.18120


## Citation

**BibTeX:**
```
@article{heakl2024arzen,
  title={ArzEn-LLM: Code-Switched Egyptian Arabic-English Translation and Speech Recognition Using LLMs},
  author={Heakl, Ahmed and Zaghloul, Youssef and Ali, Mennatullah and Hossam, Rania and Gomaa, Walid},
  journal={arXiv preprint arXiv:2406.18120},
  year={2024}
}
```


## Model Card Authors

- Email: ahmed.heakl@ejust.edu.eg
- Linkedin: https://linkedin.com/in/ahmed-heakl