|
--- |
|
tags: |
|
- Multilingual |
|
license: mit |
|
--- |
|
|
|
### Model Sources |
|
- **Paper**: LLaMAX: Scaling Linguistic Horizons of LLM by Enhancing Translation Capabilities Beyond 100 Languages |
|
- **Link**: https://arxiv.org/pdf/2407.05975 |
|
- **Repository**: https://github.com/CONE-MT/LLaMAX/ |
|
|
|
### Model Description |
|
|
|
🔥 LLaMAX-7B-X-CSQA is a commonsense reasoning model with multilingual capability, which is fully fine-tuned the powerful multilingual model [LLaMAX-7B](https://huggingface.co/LLaMAX/LLaMAX-7B) on five English commonsense reasoning dataset to train LLaMAX-7B-X-CSQA, including X-CSQA, ARC-Easy, ARC-Challenge, OpenBookQA, and QASC. |
|
|
|
🔥 Compared with fine-tuning Llama-2 on the same setting, LLaMAX-7B-X-CSQA improves the average accuracy up to 4.2% on the X-CSQA dataset. |
|
|
|
|
|
### Experiments |
|
|
|
|
|
| X-CSQA | Avg. | Sw | Ur | Hi | Ar | Vi | Ja | Pl | Zh | Nl | Ru | It | De | Pt | Fr | Es | En | |
|
|--------------------|------|------|------|------|------|----|-------|------|-------|----|------|------|-------|------|-------|--------|--------| |
|
| Llama2-7B-X-CSQA | 50.9 | 23.2 | 24.7 | 32.9 | 32.4 | 51.0 | 50.0 | 51.5 | 55.6 | 56.9 | 55.8 | 58.8 | 59.9 | 60.4 | 61.8 | 61.9 | 78.1 | |
|
| LLaMAX-7B-X-CSQA | 55.1 | 43.5 | 39.0 | 44.1 | 45.1 | 54.0 | 49.9 | 54.6 | 58.2 | 58.9 | 57.1 | 59.1 | 59.0 | 60.9 | 61.6 | 62.7 | 74.0 | |
|
|
|
### Model Usage |
|
|
|
Code Example: |
|
```angular2html |
|
from transformers import AutoTokenizer, LlamaForCausalLM |
|
|
|
model = LlamaForCausalLM.from_pretrained(PATH_TO_CONVERTED_WEIGHTS) |
|
tokenizer = AutoTokenizer.from_pretrained(PATH_TO_CONVERTED_TOKENIZER) |
|
|
|
query = "What is someone operating a vehicle likely to be accused of after becoming inebriated? \n Options: A.punish \t B. arrest \t C. automobile accidents \t D. talking nonsense \t E.drunk |
|
driving \n Answer:" |
|
inputs = tokenizer(query, return_tensors="pt") |
|
|
|
generate_ids = model.generate(inputs.input_ids, max_length=30) |
|
tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0] |
|
# => E |
|
``` |
|
|
|
### Citation |
|
if our model helps your work, please cite this paper: |
|
|
|
``` |
|
@misc{lu2024llamaxscalinglinguistichorizons, |
|
title={LLaMAX: Scaling Linguistic Horizons of LLM by Enhancing Translation Capabilities Beyond 100 Languages}, |
|
author={Yinquan Lu and Wenhao Zhu and Lei Li and Yu Qiao and Fei Yuan}, |
|
year={2024}, |
|
eprint={2407.05975}, |
|
archivePrefix={arXiv}, |
|
primaryClass={cs.CL}, |
|
url={https://arxiv.org/abs/2407.05975}, |
|
} |
|
``` |