--- datasets: - gsarti/clean_mc4_it - Chat-Error/wizard_alpaca_dolly_orca - mlabonne/orpo-dpo-mix-40k base_model: meta-llama/Meta-Llama-3-8B-Instruct model_creator: Marco Polignano - SWAP Research Group language: - en - it metrics: - accuracy pipeline_tag: text-generation tags: - facebook - meta - pythorch - llama - llama-3 - llamantino library_name: transformers license: llama3 --- llamantino3_anita

LLaMAntino-3-ANITA-8B-Inst-DPO-ITA is a model of the LLaMAntino - Large Language Models family. The model is an instruction-tuned version of Meta-Llama-3-8b-instruct (a fine-tuned LLaMA 3 model). This model version aims to be the a Multilingual Model 🏁 (EN 🇺🇸 + ITA🇮🇹) to further fine-tuning on Specific Tasks in Italian.

The 🌟**ANITA project**🌟 *(**A**dvanced **N**atural-based interaction for the **ITA**lian language)* wants to provide Italian NLP researchers with an improved model for the Italian Language 🇮🇹 use cases.

**Live DEMO:** [https://chat.llamantino.it/](https://chat.llamantino.it/)
*It works only with Italian connection.*

## Model Details *Last Update: 10/05/2024*

https://github.com/marcopoli/LLaMAntino-3-ANITA
| Model | HF | GGUF | EXL2 | |-------|-------|-------|-------| | *swap-uniba/LLaMAntino-3-ANITA-8B-Inst-DPO-ITA* | [Link](https://huggingface.co/swap-uniba/LLaMAntino-3-ANITA-8B-Inst-DPO-ITA) | [Link](https://huggingface.co/swap-uniba/LLaMAntino-3-ANITA-8B-Inst-DPO-ITA_GGUF) | [Link](https://huggingface.co/swap-uniba/LLaMAntino-3-ANITA-8B-Inst-DPO-ITA_EXL2) |

## Specifications - **Model developers**:
Ph.D. Marco Polignano - University of Bari Aldo Moro, Italy
SWAP Research Group
- **Variations**: The model release has been **supervised fine-tuning (SFT)** using **QLoRA** 4bit, on instruction-based datasets. **DPO** approach over the *mlabonne/orpo-dpo-mix-40k* dataset is used to align with human preferences for helpfulness and safety. - **Input**: Models input text only. - **Language**: Multilingual 🏁 + Italian 🇮🇹 - **Output**: Models generate text and code only. - **Model Architecture**: *Llama 3 architecture*. - **Context length**: 8K, 8192. - **Library Used**: [Unsloth](https://unsloth.ai/)

## Playground To use the model directly, there are many ways to get started, choose one of the following ways to experience it. ### Prompt Template ``` <|start_header_id|>system<|end_header_id|> { SYS Prompt }<|eot_id|><|start_header_id|>user<|end_header_id|> { USER Prompt }<|eot_id|><|start_header_id|>assistant<|end_header_id|> { ASSIST Prompt }<|eot_id|> ```` ### Transformers For direct use with `transformers`, you can easily get started with the following steps. - Firstly, you need to install transformers via the command below with `pip`. ```bash pip install -U transformers trl peft accelerate bitsandbytes ``` - Right now, you can start using the model directly. ```python import torch from transformers import ( AutoModelForCausalLM, AutoTokenizer, ) base_model = "swap-uniba/LLaMAntino-3-ANITA-8B-Inst-DPO-ITA" model = AutoModelForCausalLM.from_pretrained( base_model, torch_dtype=torch.bfloat16, device_map="auto", ) tokenizer = AutoTokenizer.from_pretrained(base_model) sys = "Sei un an assistente AI per la lingua Italiana di nome LLaMAntino-3 ANITA " \ "(Advanced Natural-based interaction for the ITAlian language)." \ " Rispondi nella lingua usata per la domanda in modo chiaro, semplice ed esaustivo." messages = [ {"role": "system", "content": sys}, {"role": "user", "content": "Chi è Carlo Magno?"} ] #Method 1 prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) inputs = tokenizer(prompt, return_tensors="pt", add_special_tokens=False) for k,v in inputs.items(): inputs[k] = v.cuda() outputs = model.generate(**inputs, max_new_tokens=512, do_sample=True, top_p=0.9, temperature=0.6) results = tokenizer.batch_decode(outputs)[0] print(results) #Method 2 import transformers pipe = transformers.pipeline( model=model, tokenizer=tokenizer, return_full_text=False, # langchain expects the full text task='text-generation', max_new_tokens=512, # max number of tokens to generate in the output temperature=0.6, #temperature for more or less creative answers do_sample=True, top_p=0.9, ) sequences = pipe(messages) for seq in sequences: print(f"{seq['generated_text']}") ``` - Additionally, you can also use a model with **4bit quantization** to reduce the required resources at least. You can start with the code below. ```python import torch from transformers import ( AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig, ) base_model = "swap-uniba/LLaMAntino-3-ANITA-8B-Inst-DPO-ITA" bnb_config = BitsAndBytesConfig( load_in_4bit=True, bnb_4bit_quant_type="nf4", bnb_4bit_compute_dtype=torch.bfloat16, bnb_4bit_use_double_quant=False, ) model = AutoModelForCausalLM.from_pretrained( base_model, quantization_config=bnb_config, device_map="auto", ) tokenizer = AutoTokenizer.from_pretrained(base_model) sys = "Sei un an assistente AI per la lingua Italiana di nome LLaMAntino-3 ANITA " \ "(Advanced Natural-based interaction for the ITAlian language)." \ " Rispondi nella lingua usata per la domanda in modo chiaro, semplice ed esaustivo." messages = [ {"role": "system", "content": sys}, {"role": "user", "content": "Chi è Carlo Magno?"} ] #Method 1 prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) inputs = tokenizer(prompt, return_tensors="pt", add_special_tokens=False) for k,v in inputs.items(): inputs[k] = v.cuda() outputs = model.generate(**inputs, max_new_tokens=512, do_sample=True, top_p=0.9, temperature=0.6) results = tokenizer.batch_decode(outputs)[0] print(results) #Method 2 import transformers pipe = transformers.pipeline( model=model, tokenizer=tokenizer, return_full_text=False, # langchain expects the full text task='text-generation', max_new_tokens=512, # max number of tokens to generate in the output temperature=0.6, #temperature for more or less creative answers do_sample=True, top_p=0.9, ) sequences = pipe(messages) for seq in sequences: print(f"{seq['generated_text']}") ```

## Evaluation **Open LLM Leaderboard:** Evaluated with lm-evaluation-benchmark-harness for the [**Open Italian LLMs Leaderboard**](https://huggingface.co/spaces/FinancialSupport/open_ita_llm_leaderboard) ``` lm_eval --model hf --model_args pretrained=HUGGINGFACE_MODEL_ID --tasks hellaswag_it,arc_it --device cuda:0 --batch_size auto:2 lm_eval --model hf --model_args pretrained=HUGGINGFACE_MODEL_ID --tasks m_mmlu_it --num_fewshot 5 --device cuda:0 --batch_size auto:2 ``` | Metric | Value | |-----------------------|---------------------------| | Avg. | **0.6160** | | Arc_IT | 0.5714 | | Hellaswag_IT | 0.7093 | | MMLU_IT | 0.5672 |

## Unsloth

[Unsloth](https://unsloth.ai), a great tool that helps us easily develop products, at a lower cost than expected. ## Citation instructions ```bibtex @misc{polignano2024advanced, title={Advanced Natural-based interaction for the ITAlian language: LLaMAntino-3-ANITA}, author={Marco Polignano and Pierpaolo Basile and Giovanni Semeraro}, year={2024}, eprint={2405.07101}, archivePrefix={arXiv}, primaryClass={cs.CL} } ``` ```bibtex @misc{basile2023llamantino, title={LLaMAntino: LLaMA 2 Models for Effective Text Generation in Italian Language}, author={Pierpaolo Basile and Elio Musacchio and Marco Polignano and Lucia Siciliani and Giuseppe Fiameni and Giovanni Semeraro}, year={2023}, eprint={2312.09993}, archivePrefix={arXiv}, primaryClass={cs.CL} } ``` ```bibtex @article{llama3modelcard, title={Llama 3 Model Card}, author={AI@Meta}, year={2024}, url = {https://github.com/meta-llama/llama3/blob/main/MODEL_CARD.md} } ``` # Acknowledgments We acknowledge the support of the PNRR project [FAIR - Future AI Research (PE00000013)](https://fondazione-fair.it/en/foundation/), Spoke 6 - Symbiotic AI (CUP H97G22000210007) under the NRRP MUR program funded by the NextGenerationEU. Models are built on the Leonardo supercomputer with the support of CINECA-Italian Super Computing Resource Allocation, class C project IscrC\_Pro\_MRS (HP10CQO70G).