munish0838
commited on
Upload README.md with huggingface_hub
Browse files
README.md
ADDED
@@ -0,0 +1,128 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
|
2 |
+
---
|
3 |
+
|
4 |
+
license: mit
|
5 |
+
datasets:
|
6 |
+
- mlabonne/FineTome-100k
|
7 |
+
- efederici/capybara-claude-15k-ita
|
8 |
+
language:
|
9 |
+
- it
|
10 |
+
- en
|
11 |
+
library_name: transformers
|
12 |
+
pipeline_tag: text-generation
|
13 |
+
base_model: microsoft/Phi-3.5-mini-instruct
|
14 |
+
tags:
|
15 |
+
- trl
|
16 |
+
- phi3
|
17 |
+
- spectrum
|
18 |
+
|
19 |
+
---
|
20 |
+
|
21 |
+
![](https://lh7-rt.googleusercontent.com/docsz/AD_4nXeiuCm7c8lEwEJuRey9kiVZsRn2W-b4pWlu3-X534V3YmVuVc2ZL-NXg2RkzSOOS2JXGHutDuyyNAUtdJI65jGTo8jT9Y99tMi4H4MqL44Uc5QKG77B0d6-JfIkZHFaUA71-RtjyYZWVIhqsNZcx8-OMaA?key=xt3VSDoCbmTY7o-cwwOFwQ)
|
22 |
+
|
23 |
+
# QuantFactory/Phi-3.5-mini-ITA-GGUF
|
24 |
+
This is quantized version of [anakin87/Phi-3.5-mini-ITA](https://huggingface.co/anakin87/Phi-3.5-mini-ITA) created using llama.cpp
|
25 |
+
|
26 |
+
# Original Model Card
|
27 |
+
|
28 |
+
|
29 |
+
<img src="./assets/phi_35_mini_ita.png" width="450"></img>
|
30 |
+
# Phi-3.5-mini-ITA
|
31 |
+
|
32 |
+
Fine-tuned version of [Microsoft/Phi-3.5-mini-instruct](https://huggingface.co/microsoft/Phi-3.5-mini-instruct) optimized for better performance in Italian.
|
33 |
+
|
34 |
+
- Small yet powerful model with 3.82 billion parameters
|
35 |
+
- Supports 128k context length
|
36 |
+
|
37 |
+
[๐ฌ๐ฎ๐น Chat with the model on Hugging Face Spaces](https://huggingface.co/spaces/anakin87/Phi-3.5-mini-ITA)
|
38 |
+
|
39 |
+
## ๐ Evaluation
|
40 |
+
|
41 |
+
| Model | Parameters | Average | MMLU_IT | ARC_IT | HELLASWAG_IT |
|
42 |
+
| ------------------------------------- | ---------- | ------- | ------- | ------ | ------------ |
|
43 |
+
| **anakin87/Phi-3.5-mini-ITA** | **3.82 B** |**57.67** | 59.93 | 51.5 | 61.57 |
|
44 |
+
| meta-llama/Meta-Llama-3.1-8B-Instruct | 8.03 B | 56.97 | 58.43 | 48.42 | 64.07 |
|
45 |
+
| microsoft/Phi-3.5-mini-instruct | 3.82 B | 56.82 | 60.03 | 49.19 | 61.25 |
|
46 |
+
|
47 |
+
For a detailed comparison of model performance, check out the [Leaderboard for Italian Language Models](https://huggingface.co/spaces/FinancialSupport/open_ita_llm_leaderboard).
|
48 |
+
|
49 |
+
## ๐ฎ Model in action
|
50 |
+
### Demo
|
51 |
+
[๐ฌ๐ฎ๐น Chat with the model on Hugging Face Spaces](https://huggingface.co/spaces/anakin87/Phi-3.5-mini-ITA)
|
52 |
+
|
53 |
+
### Text generation with Transformers
|
54 |
+
The model is small, so it runs smoothly on Colab. It is also fine to load the model using quantization.
|
55 |
+
|
56 |
+
With `transformers==4.44.2`, `trust_remote_code=True` is needed to incorporate a minor bug fix in `Phi3ForCausalLM`.
|
57 |
+
Read [this discussion](https://huggingface.co/microsoft/Phi-3.5-mini-instruct/discussions/9) for more details.
|
58 |
+
|
59 |
+
โก *The model is compatible with Flash Attention 2, which accelerates inference. To enable it, uncomment the `attn_implementation` parameter in the code snippet below.*
|
60 |
+
|
61 |
+
```python
|
62 |
+
# pip install transformers accelerate
|
63 |
+
import torch
|
64 |
+
from transformers import pipeline
|
65 |
+
|
66 |
+
model_id="anakin87/Phi-3.5-mini-ITA"
|
67 |
+
|
68 |
+
model = AutoModelForCausalLM.from_pretrained(
|
69 |
+
model_id,
|
70 |
+
device_map="auto",
|
71 |
+
torch_dtype=torch.bfloat16,
|
72 |
+
trust_remote_code=True,
|
73 |
+
# attn_implementation="flash_attention_2", # UNCOMMENT TO USE FLASH ATTENTION 2
|
74 |
+
)
|
75 |
+
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
|
76 |
+
|
77 |
+
pipe = pipeline("text-generation", model=model, tokenizer=tokenizer)
|
78 |
+
|
79 |
+
user_input = "Puoi spiegarmi brevemente la differenza tra imperfetto e passato prossimo in italiano e quando si usano?"
|
80 |
+
messages = [{"role": "user", "content": user_input}]
|
81 |
+
outputs = pipe(prompt, max_new_tokens=500, do_sample=True, temperature=0.001)
|
82 |
+
print(outputs[0]["generated_text"])
|
83 |
+
```
|
84 |
+
|
85 |
+
Example output:
|
86 |
+
```
|
87 |
+
Certamente! Imperfetto e passato prossimo sono due tempi verbali in italiano che si riferiscono a azioni passate, ma hanno sfumature diverse.
|
88 |
+
|
89 |
+
Imperfetto:
|
90 |
+
- L'imperfetto รจ usato per descrivere azioni o situazioni passate che erano continue o ripetute nel tempo.
|
91 |
+
- Indica un'azione senza una fine specifica o un'azione che si svolgeva abitualmente.
|
92 |
+
- ร spesso usato per descrivere situazioni, condizioni o stati passati.
|
93 |
+
- Esempio: "Quando ero bambino, giocavo spesso nel parco."
|
94 |
+
|
95 |
+
Passato Prossimo:
|
96 |
+
- Il passato prossimo รจ usato per descrivere azioni passate che sono state completate o che hanno avuto una durata specifica.
|
97 |
+
- Indica un'azione che รจ avvenuta in un momento specifico nel passato.
|
98 |
+
- ร spesso usato per descrivere eventi o azioni che hanno una durata definita o che si sono svolte in un momento specifico.
|
99 |
+
- Esempio: "Ieri ho finito il libro."
|
100 |
+
|
101 |
+
In sintesi, l'imperfetto si usa per azioni continue o abituali nel passato, mentre il passato prossimo si usa per azioni completate o avvenute in un momento specifico nel passato.
|
102 |
+
```
|
103 |
+
|
104 |
+
### Build AI applications
|
105 |
+
You can use the model to create a variety of AI applications.
|
106 |
+
|
107 |
+
I recommend using the [๐๏ธ Haystack LLM framework](https://haystack.deepset.ai/) for orchestration.
|
108 |
+
(spoiler: I work on it and it is open-source ๐)
|
109 |
+
|
110 |
+
This model is compatible with [`HuggingFaceLocalGenerator`](https://docs.haystack.deepset.ai/docs/huggingfacelocalgenerator) and [`HuggingFaceLocalChatGenerator`](https://docs.haystack.deepset.ai/docs/huggingfacelocalchatgenerator) components.
|
111 |
+
You can also deploy the model with a TGI container and then use it with [`HuggingFaceAPIGenerator`](https://docs.haystack.deepset.ai/docs/huggingfaceapigenerator) and the related Chat Generator.
|
112 |
+
|
113 |
+
Some examples you can keep inspiration from:
|
114 |
+
- [RAG with local open models](https://haystack.deepset.ai/blog/guide-to-using-zephyr-with-haystack2)
|
115 |
+
- [Summarization from a Website](https://github.com/deepset-ai/haystack-cookbook/blob/main/notebooks/hackernews-custom-component-rag.ipynb)
|
116 |
+
- [Multilingual RAG](https://github.com/deepset-ai/haystack-cookbook/blob/main/notebooks/multilingual_rag_podcast.ipynb)
|
117 |
+
|
118 |
+
|
119 |
+
## ๐ง Training details
|
120 |
+
This model was fine-tuned using HF TRL.
|
121 |
+
It underwent 2 epochs of instruction fine-tuning on the [FineTome-100k](https://huggingface.co/datasets/mlabonne/FineTome-100k) and [Capybara-Claude-15k-ita](https://huggingface.co/datasets/efederici/capybara-claude-15k-ita) datasets. ๐ Thanks to the authors for providing these datasets.
|
122 |
+
|
123 |
+
I adopted a relatively new technique for parameter-efficient learning: [Spectrum](https://arxiv.org/abs/2406.06623).
|
124 |
+
The idea is to train only the layers of the model with high Signal-to-Noise Ratio (SNR) and โ๏ธ freeze the rest.
|
125 |
+
|
126 |
+
Training required about 14 hours on a single A40 GPU.
|
127 |
+
|
128 |
+
I may release a guide/tutorial soon. Stay tuned! ๐ป
|