LLaMAX
/

LLaMAX2-7B-X-CSQA

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

LLaMAX2-7B-X-CSQA / README.md

LLaMAX's picture

Update README.md

e5ae727 verified about 1 month ago

|

No virus

2.51 kB

	---
	tags:
	- Multilingual
	license: mit
	---

	### Model Sources
	- Paper: LLaMAX: Scaling Linguistic Horizons of LLM by Enhancing Translation Capabilities Beyond 100 Languages
	- Link: https://arxiv.org/pdf/2407.05975
	- Repository: https://github.com/CONE-MT/LLaMAX/

	### Model Description

	🔥 LLaMAX-7B-X-CSQA is a commonsense reasoning model with multilingual capability, which is fully fine-tuned the powerful multilingual model [LLaMAX-7B](https://huggingface.co/LLaMAX/LLaMAX-7B) on five English commonsense reasoning dataset to train LLaMAX-7B-X-CSQA, including X-CSQA, ARC-Easy, ARC-Challenge, OpenBookQA, and QASC.

	🔥 Compared with fine-tuning Llama-2 on the same setting, LLaMAX-7B-X-CSQA improves the average accuracy up to 4.2% on the X-CSQA dataset.


	### Experiments


	\| X-CSQA \| Avg. \| Sw \| Ur \| Hi \| Ar \| Vi \| Ja \| Pl \| Zh \| Nl \| Ru \| It \| De \| Pt \| Fr \| Es \| En \|
	\|--------------------\|------\|------\|------\|------\|------\|----\|-------\|------\|-------\|----\|------\|------\|-------\|------\|-------\|--------\|--------\|
	\| Llama2-7B-X-CSQA \| 50.9 \| 23.2 \| 24.7 \| 32.9 \| 32.4 \| 51.0 \| 50.0 \| 51.5 \| 55.6 \| 56.9 \| 55.8 \| 58.8 \| 59.9 \| 60.4 \| 61.8 \| 61.9 \| 78.1 \|
	\| LLaMAX-7B-X-CSQA \| 55.1 \| 43.5 \| 39.0 \| 44.1 \| 45.1 \| 54.0 \| 49.9 \| 54.6 \| 58.2 \| 58.9 \| 57.1 \| 59.1 \| 59.0 \| 60.9 \| 61.6 \| 62.7 \| 74.0 \|

	### Model Usage

	Code Example:
	```angular2html
	from transformers import AutoTokenizer, LlamaForCausalLM

	model = LlamaForCausalLM.from_pretrained(PATH_TO_CONVERTED_WEIGHTS)
	tokenizer = AutoTokenizer.from_pretrained(PATH_TO_CONVERTED_TOKENIZER)

	query = "What is someone operating a vehicle likely to be accused of after becoming inebriated? \n Options: A.punish \t B. arrest \t C. automobile accidents \t D. talking nonsense \t E.drunk
	driving \n Answer:"
	inputs = tokenizer(query, return_tensors="pt")

	generate_ids = model.generate(inputs.input_ids, max_length=30)
	tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
	# => E
	```

	### Citation
	if our model helps your work, please cite this paper:

	```
	@misc{lu2024llamaxscalinglinguistichorizons,
	title={LLaMAX: Scaling Linguistic Horizons of LLM by Enhancing Translation Capabilities Beyond 100 Languages},
	author={Yinquan Lu and Wenhao Zhu and Lei Li and Yu Qiao and Fei Yuan},
	year={2024},
	eprint={2407.05975},
	archivePrefix={arXiv},
	primaryClass={cs.CL},
	url={https://arxiv.org/abs/2407.05975},
	}
	```