Update README.md

61c5bdc verified 21 days ago

No virus

4.08 kB

	---
	datasets:
	- gsarti/clean_mc4_it
	- Chat-Error/wizard_alpaca_dolly_orca
	- mlabonne/orpo-dpo-mix-40k
	base_model: meta-llama/Meta-Llama-3-8B-Instruct
	model_creator: Marco Polignano - SWAP Research Group
	language:
	- en
	- it
	metrics:
	- accuracy
	pipeline_tag: text-generation
	tags:
	- facebook
	- meta
	- pythorch
	- llama
	- llama-3
	- llamantino
	library_name: transformers
	license: llama3
	---
	<img src="https://cdn-uploads.huggingface.co/production/uploads/5df8bb21da6d0311fd3d540f/xL6Ax1I34qfC4VPKEFA6Z.png" alt="llamantino3_anita" border="0" width="800px">
	<hr>
	<!--<img src="https://i.ibb.co/6mHSRm3/llamantino53.jpg" width="200"/>-->
	<h3><i>"Built with <b>Meta Llama 3</b>".</i></i></h3>
	<p style="text-align:justify;"><b>LLaMAntino-3-ANITA-8B-Inst-DPO-ITA</b> is a model of the <a href="https://huggingface.co/swap-uniba"><b>LLaMAntino</b></a> - <i>Large Language Models family</i>.
	The model is an instruction-tuned version of <a href="https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct"><b>Meta-Llama-3-8b-instruct</b></a> (a fine-tuned <b>LLaMA 3 model</b>).
	This model version aims to be the a <b>Multilingual Model</b> 🏁 (EN 🇺🇸 + ITA🇮🇹) to further fine-tuning on Specific Tasks in Italian.</p>


	The 🌟ANITA project🌟 (Advanced Natural-based interaction for the ITAlian language)
	wants to provide Italian NLP researchers with an improved model for the Italian Language 🇮🇹 use cases.


	<hr>

	## Model Details

	<img src="https://static.vecteezy.com/system/resources/previews/016/833/880/large_2x/github-logo-git-hub-icon-with-text-on-white-background-free-vector.jpg" width="200"> [https://github.com/marcopoli/LLaMAntino-3-ANITA](https://github.com/marcopoli/LLaMAntino-3-ANITA)<br>

	<br>

	- [Full Model: LaMAntino-3-ANITA-8B-Inst-DPO-ITA](https://huggingface.co/swap-uniba/LLaMAntino-3-ANITA-8B-Inst-DPO-ITA)
	- ExLlamaV2 - 3.0bpw model
	- ExLlamaV2 - 4.0bpw model
	- ExLlamaV2 - 4.5bpw model
	- ExLlamaV2 - measurement.json

	<hr>

	## Specifications

	- Model developers: <br><a href="https://marcopoli.github.io/">Ph.D. Marco Polignano</a> - University of Bari Aldo Moro, Italy <br> <a href="https://huggingface.co/swap-uniba">SWAP Research Group</a> <br>
	- Variations: The model release has been supervised fine-tuning (SFT) using QLoRA 4bit, on instruction-based datasets. DPO approach over the mlabonne/orpo-dpo-mix-40k dataset is used to align with human preferences for helpfulness and safety.
	- Input: Models input text only.
	- Language: Multilingual 🏁 + Italian 🇮🇹
	- Output: Models generate text and code only.
	- Model Architecture: Llama 3 architecture.
	- Context length: 8K, 8192.
	- Library Used: [LLaMA.cpp](https://github.com/ggerganov/llama.cpp)

	<hr>

	### Prompt Template
	```
	<\|start_header_id\|>system<\|end_header_id\|>

	{ SYS Prompt }<\|eot_id\|><\|start_header_id\|>user<\|end_header_id\|>

	{ USER Prompt }<\|eot_id\|><\|start_header_id\|>assistant<\|end_header_id\|>

	{ ASSIST Prompt }<\|eot_id\|>
	````

	<hr>

	## ExLlamaV2

	[ExLlamaV2](https://github.com/turboderp/exllamav2), a great tool that helps us easily Quantize your model in EXL2 format.

	## Citation instructions
	```bibtex
	@misc{polignano2024advanced,
	title={Advanced Natural-based interaction for the ITAlian language: LLaMAntino-3-ANITA},
	author={Marco Polignano and Pierpaolo Basile and Giovanni Semeraro},
	year={2024},
	eprint={2405.07101},
	archivePrefix={arXiv},
	primaryClass={cs.CL}
	}
	```

	```bibtex
	@misc{basile2023llamantino,
	title={LLaMAntino: LLaMA 2 Models for Effective Text Generation in Italian Language},
	author={Pierpaolo Basile and Elio Musacchio and Marco Polignano and Lucia Siciliani and Giuseppe Fiameni and Giovanni Semeraro},
	year={2023},
	eprint={2312.09993},
	archivePrefix={arXiv},
	primaryClass={cs.CL}
	}
	```

	```bibtex
	@article{llama3modelcard,
	title={Llama 3 Model Card},
	author={AI@Meta},
	year={2024},
	url = {https://github.com/meta-llama/llama3/blob/main/MODEL_CARD.md}
	}
	```