Text Generation
Transformers
GGUF
Italian
English
trl
phi3
spectrum
Inference Endpoints
conversational
munish0838 commited on
Commit
f62cefb
ยท
verified ยท
1 Parent(s): 31cb734

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +128 -0
README.md ADDED
@@ -0,0 +1,128 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
+ ---
3
+
4
+ license: mit
5
+ datasets:
6
+ - mlabonne/FineTome-100k
7
+ - efederici/capybara-claude-15k-ita
8
+ language:
9
+ - it
10
+ - en
11
+ library_name: transformers
12
+ pipeline_tag: text-generation
13
+ base_model: microsoft/Phi-3.5-mini-instruct
14
+ tags:
15
+ - trl
16
+ - phi3
17
+ - spectrum
18
+
19
+ ---
20
+
21
+ ![](https://lh7-rt.googleusercontent.com/docsz/AD_4nXeiuCm7c8lEwEJuRey9kiVZsRn2W-b4pWlu3-X534V3YmVuVc2ZL-NXg2RkzSOOS2JXGHutDuyyNAUtdJI65jGTo8jT9Y99tMi4H4MqL44Uc5QKG77B0d6-JfIkZHFaUA71-RtjyYZWVIhqsNZcx8-OMaA?key=xt3VSDoCbmTY7o-cwwOFwQ)
22
+
23
+ # QuantFactory/Phi-3.5-mini-ITA-GGUF
24
+ This is quantized version of [anakin87/Phi-3.5-mini-ITA](https://huggingface.co/anakin87/Phi-3.5-mini-ITA) created using llama.cpp
25
+
26
+ # Original Model Card
27
+
28
+
29
+ <img src="./assets/phi_35_mini_ita.png" width="450"></img>
30
+ # Phi-3.5-mini-ITA
31
+
32
+ Fine-tuned version of [Microsoft/Phi-3.5-mini-instruct](https://huggingface.co/microsoft/Phi-3.5-mini-instruct) optimized for better performance in Italian.
33
+
34
+ - Small yet powerful model with 3.82 billion parameters
35
+ - Supports 128k context length
36
+
37
+ [๐Ÿ’ฌ๐Ÿ‡ฎ๐Ÿ‡น Chat with the model on Hugging Face Spaces](https://huggingface.co/spaces/anakin87/Phi-3.5-mini-ITA)
38
+
39
+ ## ๐Ÿ† Evaluation
40
+
41
+ | Model | Parameters | Average | MMLU_IT | ARC_IT | HELLASWAG_IT |
42
+ | ------------------------------------- | ---------- | ------- | ------- | ------ | ------------ |
43
+ | **anakin87/Phi-3.5-mini-ITA** | **3.82 B** |**57.67** | 59.93 | 51.5 | 61.57 |
44
+ | meta-llama/Meta-Llama-3.1-8B-Instruct | 8.03 B | 56.97 | 58.43 | 48.42 | 64.07 |
45
+ | microsoft/Phi-3.5-mini-instruct | 3.82 B | 56.82 | 60.03 | 49.19 | 61.25 |
46
+
47
+ For a detailed comparison of model performance, check out the [Leaderboard for Italian Language Models](https://huggingface.co/spaces/FinancialSupport/open_ita_llm_leaderboard).
48
+
49
+ ## ๐ŸŽฎ Model in action
50
+ ### Demo
51
+ [๐Ÿ’ฌ๐Ÿ‡ฎ๐Ÿ‡น Chat with the model on Hugging Face Spaces](https://huggingface.co/spaces/anakin87/Phi-3.5-mini-ITA)
52
+
53
+ ### Text generation with Transformers
54
+ The model is small, so it runs smoothly on Colab. It is also fine to load the model using quantization.
55
+
56
+ With `transformers==4.44.2`, `trust_remote_code=True` is needed to incorporate a minor bug fix in `Phi3ForCausalLM`.
57
+ Read [this discussion](https://huggingface.co/microsoft/Phi-3.5-mini-instruct/discussions/9) for more details.
58
+
59
+ โšก *The model is compatible with Flash Attention 2, which accelerates inference. To enable it, uncomment the `attn_implementation` parameter in the code snippet below.*
60
+
61
+ ```python
62
+ # pip install transformers accelerate
63
+ import torch
64
+ from transformers import pipeline
65
+
66
+ model_id="anakin87/Phi-3.5-mini-ITA"
67
+
68
+ model = AutoModelForCausalLM.from_pretrained(
69
+ model_id,
70
+ device_map="auto",
71
+ torch_dtype=torch.bfloat16,
72
+ trust_remote_code=True,
73
+ # attn_implementation="flash_attention_2", # UNCOMMENT TO USE FLASH ATTENTION 2
74
+ )
75
+ tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
76
+
77
+ pipe = pipeline("text-generation", model=model, tokenizer=tokenizer)
78
+
79
+ user_input = "Puoi spiegarmi brevemente la differenza tra imperfetto e passato prossimo in italiano e quando si usano?"
80
+ messages = [{"role": "user", "content": user_input}]
81
+ outputs = pipe(prompt, max_new_tokens=500, do_sample=True, temperature=0.001)
82
+ print(outputs[0]["generated_text"])
83
+ ```
84
+
85
+ Example output:
86
+ ```
87
+ Certamente! Imperfetto e passato prossimo sono due tempi verbali in italiano che si riferiscono a azioni passate, ma hanno sfumature diverse.
88
+
89
+ Imperfetto:
90
+ - L'imperfetto รจ usato per descrivere azioni o situazioni passate che erano continue o ripetute nel tempo.
91
+ - Indica un'azione senza una fine specifica o un'azione che si svolgeva abitualmente.
92
+ - รˆ spesso usato per descrivere situazioni, condizioni o stati passati.
93
+ - Esempio: "Quando ero bambino, giocavo spesso nel parco."
94
+
95
+ Passato Prossimo:
96
+ - Il passato prossimo รจ usato per descrivere azioni passate che sono state completate o che hanno avuto una durata specifica.
97
+ - Indica un'azione che รจ avvenuta in un momento specifico nel passato.
98
+ - รˆ spesso usato per descrivere eventi o azioni che hanno una durata definita o che si sono svolte in un momento specifico.
99
+ - Esempio: "Ieri ho finito il libro."
100
+
101
+ In sintesi, l'imperfetto si usa per azioni continue o abituali nel passato, mentre il passato prossimo si usa per azioni completate o avvenute in un momento specifico nel passato.
102
+ ```
103
+
104
+ ### Build AI applications
105
+ You can use the model to create a variety of AI applications.
106
+
107
+ I recommend using the [๐Ÿ—๏ธ Haystack LLM framework](https://haystack.deepset.ai/) for orchestration.
108
+ (spoiler: I work on it and it is open-source ๐Ÿ˜„)
109
+
110
+ This model is compatible with [`HuggingFaceLocalGenerator`](https://docs.haystack.deepset.ai/docs/huggingfacelocalgenerator) and [`HuggingFaceLocalChatGenerator`](https://docs.haystack.deepset.ai/docs/huggingfacelocalchatgenerator) components.
111
+ You can also deploy the model with a TGI container and then use it with [`HuggingFaceAPIGenerator`](https://docs.haystack.deepset.ai/docs/huggingfaceapigenerator) and the related Chat Generator.
112
+
113
+ Some examples you can keep inspiration from:
114
+ - [RAG with local open models](https://haystack.deepset.ai/blog/guide-to-using-zephyr-with-haystack2)
115
+ - [Summarization from a Website](https://github.com/deepset-ai/haystack-cookbook/blob/main/notebooks/hackernews-custom-component-rag.ipynb)
116
+ - [Multilingual RAG](https://github.com/deepset-ai/haystack-cookbook/blob/main/notebooks/multilingual_rag_podcast.ipynb)
117
+
118
+
119
+ ## ๐Ÿ”ง Training details
120
+ This model was fine-tuned using HF TRL.
121
+ It underwent 2 epochs of instruction fine-tuning on the [FineTome-100k](https://huggingface.co/datasets/mlabonne/FineTome-100k) and [Capybara-Claude-15k-ita](https://huggingface.co/datasets/efederici/capybara-claude-15k-ita) datasets. ๐Ÿ™ Thanks to the authors for providing these datasets.
122
+
123
+ I adopted a relatively new technique for parameter-efficient learning: [Spectrum](https://arxiv.org/abs/2406.06623).
124
+ The idea is to train only the layers of the model with high Signal-to-Noise Ratio (SNR) and โ„๏ธ freeze the rest.
125
+
126
+ Training required about 14 hours on a single A40 GPU.
127
+
128
+ I may release a guide/tutorial soon. Stay tuned! ๐Ÿ“ป