Text Generation
Transformers
Safetensors
English
Italian
facebook
meta
pythorch
llama
llama-3
llamantino
Inference Endpoints
Edit model card
llamantino3_anita

"Built with Meta Llama 3".

LLaMAntino-3-ANITA-8B-Inst-DPO-ITA is a model of the LLaMAntino - Large Language Models family. The model is an instruction-tuned version of Meta-Llama-3-8b-instruct (a fine-tuned LLaMA 3 model). This model version aims to be the a Multilingual Model ๐Ÿ (EN ๐Ÿ‡บ๐Ÿ‡ธ + ITA๐Ÿ‡ฎ๐Ÿ‡น) to further fine-tuning on Specific Tasks in Italian.

The ๐ŸŒŸANITA project๐ŸŒŸ *(Advanced Natural-based interaction for the ITAlian language)* wants to provide Italian NLP researchers with an improved model for the Italian Language ๐Ÿ‡ฎ๐Ÿ‡น use cases.


Model Details

https://github.com/marcopoli/LLaMAntino-3-ANITA



Specifications

  • Model developers:
    Ph.D. Marco Polignano - University of Bari Aldo Moro, Italy
    SWAP Research Group
  • Variations: The model release has been supervised fine-tuning (SFT) using QLoRA 4bit, on instruction-based datasets. DPO approach over the mlabonne/orpo-dpo-mix-40k dataset is used to align with human preferences for helpfulness and safety.
  • Input: Models input text only.
  • Language: Multilingual ๐Ÿ + Italian ๐Ÿ‡ฎ๐Ÿ‡น
  • Output: Models generate text and code only.
  • Model Architecture: Llama 3 architecture.
  • Context length: 8K, 8192.
  • Library Used: LLaMA.cpp

Prompt Template

<|start_header_id|>system<|end_header_id|>

{ SYS Prompt }<|eot_id|><|start_header_id|>user<|end_header_id|>

{ USER Prompt }<|eot_id|><|start_header_id|>assistant<|end_header_id|>

{ ASSIST Prompt }<|eot_id|>

ExLlamaV2

ExLlamaV2, a great tool that helps us easily Quantize your model in EXL2 format.

Citation instructions

@misc{polignano2024advanced,
      title={Advanced Natural-based interaction for the ITAlian language: LLaMAntino-3-ANITA}, 
      author={Marco Polignano and Pierpaolo Basile and Giovanni Semeraro},
      year={2024},
      eprint={2405.07101},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}
@misc{basile2023llamantino,
      title={LLaMAntino: LLaMA 2 Models for Effective Text Generation in Italian Language}, 
      author={Pierpaolo Basile and Elio Musacchio and Marco Polignano and Lucia Siciliani and Giuseppe Fiameni and Giovanni Semeraro},
      year={2023},
      eprint={2312.09993},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}
@article{llama3modelcard,
  title={Llama 3 Model Card},
  author={AI@Meta},
  year={2024},
  url = {https://github.com/meta-llama/llama3/blob/main/MODEL_CARD.md}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Finetuned from

Datasets used to train swap-uniba/LLaMAntino-3-ANITA-8B-Inst-DPO-ITA_EXL2