Edit model card

Model Sources

Model Description

🔥 LLaMAX-7B-X-CSQA is a commonsense reasoning model with multilingual capability, which is fully fine-tuned the powerful multilingual model LLaMAX-7B on five English commonsense reasoning dataset to train LLaMAX-7B-X-CSQA, including X-CSQA, ARC-Easy, ARC-Challenge, OpenBookQA, and QASC.

🔥 Compared with fine-tuning Llama-2 on the same setting, LLaMAX-7B-X-CSQA improves the average accuracy up to 4.2% on the X-CSQA dataset.

Experiments

X-CSQA Avg. Sw Ur Hi Ar Vi Ja Pl Zh Nl Ru It De Pt Fr Es En
Llama2-7B-X-CSQA 50.9 23.2 24.7 32.9 32.4 51.0 50.0 51.5 55.6 56.9 55.8 58.8 59.9 60.4 61.8 61.9 78.1
LLaMAX-7B-X-CSQA 55.1 43.5 39.0 44.1 45.1 54.0 49.9 54.6 58.2 58.9 57.1 59.1 59.0 60.9 61.6 62.7 74.0

Model Usage

Code Example:

from transformers import AutoTokenizer, LlamaForCausalLM

model = LlamaForCausalLM.from_pretrained(PATH_TO_CONVERTED_WEIGHTS)
tokenizer = AutoTokenizer.from_pretrained(PATH_TO_CONVERTED_TOKENIZER)

query = "What is someone operating a vehicle likely to be accused of after becoming inebriated? \n Options: A.punish \t B. arrest \t C. automobile accidents \t D. talking nonsense \t E.drunk
driving \n Answer:"
inputs = tokenizer(query, return_tensors="pt")

generate_ids = model.generate(inputs.input_ids, max_length=30)
tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
# => E

Citation

if our model helps your work, please cite this paper:

@misc{lu2024llamaxscalinglinguistichorizons,
      title={LLaMAX: Scaling Linguistic Horizons of LLM by Enhancing Translation Capabilities Beyond 100 Languages}, 
      author={Yinquan Lu and Wenhao Zhu and Lei Li and Yu Qiao and Fei Yuan},
      year={2024},
      eprint={2407.05975},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2407.05975}, 
}
Downloads last month
13
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.