|
--- |
|
datasets: |
|
- gsarti/clean_mc4_it |
|
- Chat-Error/wizard_alpaca_dolly_orca |
|
- mlabonne/orpo-dpo-mix-40k |
|
base_model: meta-llama/Meta-Llama-3-8B-Instruct |
|
model_creator: Marco Polignano - SWAP Research Group |
|
language: |
|
- en |
|
- it |
|
metrics: |
|
- accuracy |
|
pipeline_tag: text-generation |
|
tags: |
|
- facebook |
|
- meta |
|
- pythorch |
|
- llama |
|
- llama-3 |
|
- llamantino |
|
library_name: transformers |
|
license: llama3 |
|
--- |
|
<img src="https://cdn-uploads.huggingface.co/production/uploads/5df8bb21da6d0311fd3d540f/xL6Ax1I34qfC4VPKEFA6Z.png" alt="llamantino3_anita" border="0" width="800px"> |
|
<hr> |
|
<!--<img src="https://i.ibb.co/6mHSRm3/llamantino53.jpg" width="200"/>--> |
|
<h3><i>"Built with <b>Meta Llama 3</b>".</i></i></h3> |
|
<p style="text-align:justify;"><b>LLaMAntino-3-ANITA-8B-Inst-DPO-ITA</b> is a model of the <a href="https://huggingface.co/swap-uniba"><b>LLaMAntino</b></a> - <i>Large Language Models family</i>. |
|
The model is an instruction-tuned version of <a href="https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct"><b>Meta-Llama-3-8b-instruct</b></a> (a fine-tuned <b>LLaMA 3 model</b>). |
|
This model version aims to be the a <b>Multilingual Model</b> ๐ (EN ๐บ๐ธ + ITA๐ฎ๐น) to further fine-tuning on Specific Tasks in Italian.</p> |
|
|
|
|
|
The ๐**ANITA project**๐ *(**A**dvanced **N**atural-based interaction for the **ITA**lian language)* |
|
wants to provide Italian NLP researchers with an improved model for the Italian Language ๐ฎ๐น use cases. |
|
|
|
<hr> |
|
|
|
## Model Details |
|
|
|
<img src="https://static.vecteezy.com/system/resources/previews/016/833/880/large_2x/github-logo-git-hub-icon-with-text-on-white-background-free-vector.jpg" width="200"> [https://github.com/marcopoli/LLaMAntino-3-ANITA](https://github.com/marcopoli/LLaMAntino-3-ANITA)<br> |
|
|
|
<br> |
|
|
|
- [**Full Model: LaMAntino-3-ANITA-8B-Inst-DPO-ITA**](https://huggingface.co/swap-uniba/LLaMAntino-3-ANITA-8B-Inst-DPO-ITA) |
|
- LLaMA.cpp - **F16 model** |
|
- LLaMA.cpp - **Q8_0 model** |
|
- LLaMA.cpp - **Q4_K_M model** |
|
- LLaMA.cpp - **Q2_K model** |
|
|
|
<hr> |
|
|
|
## Specifications |
|
|
|
- **Model developers**: <br><a href="https://marcopoli.github.io/">Ph.D. Marco Polignano</a> - University of Bari Aldo Moro, Italy <br> <a href="https://huggingface.co/swap-uniba">SWAP Research Group</a> <br> |
|
- **Variations**: The model release has been **supervised fine-tuning (SFT)** using **QLoRA** 4bit, on instruction-based datasets. **DPO** approach over the *mlabonne/orpo-dpo-mix-40k* dataset is used to align with human preferences for helpfulness and safety. |
|
- **Input**: Models input text only. |
|
- **Language**: Multilingual ๐ + Italian ๐ฎ๐น |
|
- **Output**: Models generate text and code only. |
|
- **Model Architecture**: *Llama 3 architecture*. |
|
- **Context length**: 8K, 8192. |
|
- **Library Used**: [LLaMA.cpp](https://github.com/ggerganov/llama.cpp) |
|
|
|
<hr> |
|
|
|
### Prompt Template |
|
``` |
|
<|start_header_id|>system<|end_header_id|> |
|
|
|
{ SYS Prompt }<|eot_id|><|start_header_id|>user<|end_header_id|> |
|
|
|
{ USER Prompt }<|eot_id|><|start_header_id|>assistant<|end_header_id|> |
|
|
|
{ ASSIST Prompt }<|eot_id|> |
|
```` |
|
|
|
<hr> |
|
|
|
## LLaMA.cpp |
|
|
|
<img src="https://user-images.githubusercontent.com/1991296/230134379-7181e485-c521-4d23-a0d6-f7b3b61ba524.png" width="200px" align="center" /> |
|
|
|
[LLaMA.cpp](https://github.com/ggerganov/llama.cpp), a great tool that helps us easily Quantize your model in **GGUF format**. |
|
|
|
## Citation instructions |
|
```bibtex |
|
@misc{polignano2024advanced, |
|
title={Advanced Natural-based interaction for the ITAlian language: LLaMAntino-3-ANITA}, |
|
author={Marco Polignano and Pierpaolo Basile and Giovanni Semeraro}, |
|
year={2024}, |
|
eprint={2405.07101}, |
|
archivePrefix={arXiv}, |
|
primaryClass={cs.CL} |
|
} |
|
``` |
|
|
|
```bibtex |
|
@misc{basile2023llamantino, |
|
title={LLaMAntino: LLaMA 2 Models for Effective Text Generation in Italian Language}, |
|
author={Pierpaolo Basile and Elio Musacchio and Marco Polignano and Lucia Siciliani and Giuseppe Fiameni and Giovanni Semeraro}, |
|
year={2023}, |
|
eprint={2312.09993}, |
|
archivePrefix={arXiv}, |
|
primaryClass={cs.CL} |
|
} |
|
``` |
|
|
|
```bibtex |
|
@article{llama3modelcard, |
|
title={Llama 3 Model Card}, |
|
author={AI@Meta}, |
|
year={2024}, |
|
url = {https://github.com/meta-llama/llama3/blob/main/MODEL_CARD.md} |
|
} |
|
``` |