Text Generation
Transformers
Safetensors
English
Italian
facebook
meta
pythorch
llama
llama-3
llamantino
Inference Endpoints
m-polignano-uniba commited on
Commit
defc44a
โ€ข
1 Parent(s): c26caae

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +102 -0
README.md ADDED
@@ -0,0 +1,102 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ datasets:
3
+ - gsarti/clean_mc4_it
4
+ - Chat-Error/wizard_alpaca_dolly_orca
5
+ - jondurbin/truthy-dpo-v0.1
6
+ - mlabonne/orpo-dpo-mix-40k
7
+ base_model: meta-llama/Meta-Llama-3-8B-Instruct
8
+ model_creator: Marco Polignano - SWAP Research Group
9
+ language:
10
+ - en
11
+ - it
12
+ metrics:
13
+ - accuracy
14
+ pipeline_tag: text-generation
15
+ tags:
16
+ - facebook
17
+ - meta
18
+ - pythorch
19
+ - llama
20
+ - llama-3
21
+ - llamantino
22
+ library_name: transformers
23
+ license: llama3
24
+ ---
25
+ <img src="https://cdn-uploads.huggingface.co/production/uploads/5df8bb21da6d0311fd3d540f/cZoZdwQOPdQsnQmDXHcSn.png" alt="llamantino3_anita" border="0" width="800px">
26
+ <hr>
27
+ <!--<img src="https://i.ibb.co/6mHSRm3/llamantino53.jpg" width="200"/>-->
28
+
29
+ <p style="text-align:justify;"><b>LLaMAntino-3-ANITA-8B-Inst-DPO-ITA</b> is a model of the <a href="https://huggingface.co/swap-uniba"><b>LLaMAntino</b></a> - <i>Large Language Models family</i>.
30
+ The model is an instruction-tuned version of <a href="https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct"><b>Meta-Llama-3-8b-instruct</b></a> (a fine-tuned <b>LLaMA 3 model</b>).
31
+ This model version aims to be the a <b>Multilingual Model</b> ๐Ÿ (EN ๐Ÿ‡บ๐Ÿ‡ธ + ITA๐Ÿ‡ฎ๐Ÿ‡น) to further fine-tuning on Specific Tasks in Italian.</p>
32
+
33
+
34
+ The ๐ŸŒŸ**ANITA project**๐ŸŒŸ *(**A**dvanced **N**atural-based interaction for the **ITA**lian language)*
35
+ wants to provide Italian NLP researchers with an improved model the for Italian Language ๐Ÿ‡ฎ๐Ÿ‡น use cases.
36
+
37
+ <hr>
38
+
39
+ ## Model Details
40
+
41
+ <img src="https://static.vecteezy.com/system/resources/previews/016/833/880/large_2x/github-logo-git-hub-icon-with-text-on-white-background-free-vector.jpg" width="200"> [https://github.com/marcopoli/LLaMAntino-3-ANITA](https://github.com/marcopoli/LLaMAntino-3-ANITA)<br>
42
+
43
+ <br>
44
+
45
+ - [**Full Model: LaMAntino-3-ANITA-8B-Inst-DPO-ITA**](https://huggingface.co/swap-uniba/LLaMAntino-3-ANITA-8B-Inst-DPO-ITA)
46
+ - ExLlamaV2 - **3.0bpw model**
47
+ - ExLlamaV2 - **4.0bpw model**
48
+ - ExLlamaV2 - **4.5bpw model**
49
+ - ExLlamaV2 - **measurement.json**
50
+
51
+ <hr>
52
+
53
+ ## Specifications
54
+
55
+ - **Model developers**: <br><a href="https://marcopoli.github.io/">Ph.D. Marco Polignano</a> - University of Bari Aldo Moro, Italy <br> <a href="https://huggingface.co/swap-uniba">SWAP Research Group</a> <br>
56
+ - **Variations**: The model release has been **supervised fine-tuning (SFT)** using **QLoRA** 4bit, on two instruction-based datasets. **DPO** approach over the *jondurbin/truthy-dpo-v0.1* dataset is used to align with human preferences for helpfulness and safety.
57
+ - **Input**: Models input text only.
58
+ - **Language**: Multilingual ๐Ÿ + Italian ๐Ÿ‡ฎ๐Ÿ‡น
59
+ - **Output**: Models generate text and code only.
60
+ - **Model Architecture**: *Llama 3 architecture*.
61
+ - **Context length**: 8K, 8192.
62
+ - **Library Used**: [LLaMA.cpp](https://github.com/ggerganov/llama.cpp)
63
+
64
+ <hr>
65
+
66
+ ### Prompt Template
67
+ ```
68
+ <|start_header_id|>system<|end_header_id|>
69
+
70
+ { SYS Prompt }<|eot_id|><|start_header_id|>user<|end_header_id|>
71
+
72
+ { USER Prompt }<|eot_id|><|start_header_id|>assistant<|end_header_id|>
73
+
74
+ { ASSIST Prompt }<|eot_id|>
75
+ ````
76
+
77
+ <hr>
78
+
79
+ ## ExLlamaV2
80
+
81
+ [ExLlamaV2](https://github.com/turboderp/exllamav2), a great tool that helps us easily Quantize your model in **EXL2 format**.
82
+
83
+ ## Citation instructions
84
+ ```bibtex
85
+ @misc{basile2023llamantino,
86
+ title={LLaMAntino: LLaMA 2 Models for Effective Text Generation in Italian Language},
87
+ author={Pierpaolo Basile and Elio Musacchio and Marco Polignano and Lucia Siciliani and Giuseppe Fiameni and Giovanni Semeraro},
88
+ year={2023},
89
+ eprint={2312.09993},
90
+ archivePrefix={arXiv},
91
+ primaryClass={cs.CL}
92
+ }
93
+ ```
94
+
95
+ ```bibtex
96
+ @article{llama3modelcard,
97
+ title={Llama 3 Model Card},
98
+ author={AI@Meta},
99
+ year={2024},
100
+ url = {https://github.com/meta-llama/llama3/blob/main/MODEL_CARD.md}
101
+ }
102
+ ```