File size: 5,746 Bytes
eef067e
 
84fd9ee
 
 
 
 
 
eef067e
 
84fd9ee
 
 
 
 
eef067e
 
84fd9ee
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
---
license: mit
datasets:
- mlabonne/FineTome-100k
- efederici/capybara-claude-15k-ita
language:
- it
- en
library_name: transformers
pipeline_tag: text-generation
base_model: microsoft/Phi-3.5-mini-instruct
tags:
- trl
- phi3
- spectrum
---

<img src="./assets/phi_35_mini_ita.png" width="450"></img>
# Phi-3.5-mini-ITA

Fine-tuned version of [Microsoft/Phi-3.5-mini-instruct](https://huggingface.co/microsoft/Phi-3.5-mini-instruct) optimized for better performance in Italian.

- Small yet powerful model with 3.82 billion parameters
- Supports 128k context length

[๐Ÿ’ฌ๐Ÿ‡ฎ๐Ÿ‡น Chat with the model on Hugging Face Spaces](https://huggingface.co/spaces/anakin87/Phi-3.5-mini-ITA)

## ๐Ÿ† Evaluation

| Model                                 | Parameters | Average | MMLU_IT | ARC_IT | HELLASWAG_IT |
| ------------------------------------- | ---------- | ------- | ------- | ------ | ------------ |
| **anakin87/Phi-3.5-mini-ITA**         | **3.82 B**     |**57.67**   | 59.93   | 51.5  | 61.57        |
| meta-llama/Meta-Llama-3.1-8B-Instruct | 8.03 B     | 56.97   | 58.43   | 48.42  | 64.07        |
| microsoft/Phi-3.5-mini-instruct       | 3.82 B     | 56.82   | 60.03   | 49.19  | 61.25        |

For a detailed comparison of model performance, check out the [Leaderboard for Italian Language Models](https://huggingface.co/spaces/FinancialSupport/open_ita_llm_leaderboard).

## ๐ŸŽฎ Model in action
### Demo
[๐Ÿ’ฌ๐Ÿ‡ฎ๐Ÿ‡น Chat with the model on Hugging Face Spaces](https://huggingface.co/spaces/anakin87/Phi-3.5-mini-ITA)

### Text generation with Transformers
The model is small, so it runs smoothly on Colab. It is also fine to load the model using quantization.

With `transformers==4.44.2`, `trust_remote_code=True` is needed to incorporate a minor bug fix in `Phi3ForCausalLM`.
Read [this discussion](https://huggingface.co/microsoft/Phi-3.5-mini-instruct/discussions/9) for more details.

โšก *The model is compatible with Flash Attention 2, which  accelerates inference. To enable it, uncomment the `attn_implementation` parameter in the code snippet below.*

```python
# pip install transformers accelerate
import torch
from transformers import pipeline

model_id="anakin87/Phi-3.5-mini-ITA"

model = AutoModelForCausalLM.from_pretrained(
    model_id, 
    device_map="auto",
    torch_dtype=torch.bfloat16,
    trust_remote_code=True,
    # attn_implementation="flash_attention_2",  # UNCOMMENT TO USE FLASH ATTENTION 2
)
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)

pipe = pipeline("text-generation", model=model, tokenizer=tokenizer)

user_input = "Puoi spiegarmi brevemente la differenza tra imperfetto e passato prossimo in italiano e quando si usano?"
messages = [{"role": "user", "content": user_input}]
outputs = pipe(prompt, max_new_tokens=500, do_sample=True, temperature=0.001)
print(outputs[0]["generated_text"])
```

Example output:
```
Certamente! Imperfetto e passato prossimo sono due tempi verbali in italiano che si riferiscono a azioni passate, ma hanno sfumature diverse.

Imperfetto:
- L'imperfetto รจ usato per descrivere azioni o situazioni passate che erano continue o ripetute nel tempo.
- Indica un'azione senza una fine specifica o un'azione che si svolgeva abitualmente.
- รˆ spesso usato per descrivere situazioni, condizioni o stati passati.
- Esempio: "Quando ero bambino, giocavo spesso nel parco."

Passato Prossimo:
- Il passato prossimo รจ usato per descrivere azioni passate che sono state completate o che hanno avuto una durata specifica.
- Indica un'azione che รจ avvenuta in un momento specifico nel passato.
- รˆ spesso usato per descrivere eventi o azioni che hanno una durata definita o che si sono svolte in un momento specifico.
- Esempio: "Ieri ho finito il libro."

In sintesi, l'imperfetto si usa per azioni continue o abituali nel passato, mentre il passato prossimo si usa per azioni completate o avvenute in un momento specifico nel passato.
```

### Build AI applications
You can use the model to create a variety of AI applications.

I recommend using the [๐Ÿ—๏ธ Haystack LLM framework](https://haystack.deepset.ai/) for orchestration.
(spoiler: I work on it and it is open-source ๐Ÿ˜„)

This model is compatible with [`HuggingFaceLocalGenerator`](https://docs.haystack.deepset.ai/docs/huggingfacelocalgenerator) and [`HuggingFaceLocalChatGenerator`](https://docs.haystack.deepset.ai/docs/huggingfacelocalchatgenerator) components.
You can also deploy the model with a TGI container and then use it with [`HuggingFaceAPIGenerator`](https://docs.haystack.deepset.ai/docs/huggingfaceapigenerator) and the related Chat Generator.

Some examples you can keep inspiration from:
- [RAG with local open models](https://haystack.deepset.ai/blog/guide-to-using-zephyr-with-haystack2)
- [Summarization from a Website](https://github.com/deepset-ai/haystack-cookbook/blob/main/notebooks/hackernews-custom-component-rag.ipynb)
- [Multilingual RAG](https://github.com/deepset-ai/haystack-cookbook/blob/main/notebooks/multilingual_rag_podcast.ipynb)


## ๐Ÿ”ง Training details
This model was fine-tuned using HF TRL.
It underwent 2 epochs of instruction fine-tuning on the [FineTome-100k](https://huggingface.co/datasets/mlabonne/FineTome-100k) and [Capybara-Claude-15k-ita](https://huggingface.co/datasets/efederici/capybara-claude-15k-ita) datasets. ๐Ÿ™ Thanks to the authors for providing these datasets.

I adopted a relatively new technique for parameter-efficient learning: [Spectrum](https://arxiv.org/abs/2406.06623).
The idea is to train only the layers of the model with high Signal-to-Noise Ratio (SNR) and โ„๏ธ freeze the rest.

Training required about 14 hours on a single A40 GPU.

I may release a guide/tutorial soon. Stay tuned! ๐Ÿ“ป