File size: 11,520 Bytes
c04b794
97805a3
c04b794
97805a3
 
 
 
7e3cc96
c04b794
7e3cc96
97805a3
 
 
7e3cc96
97805a3
 
 
7e3cc96
97805a3
 
7e3cc96
97805a3
 
 
 
7e3cc96
97805a3
 
7e3cc96
 
 
 
 
97805a3
7e3cc96
97805a3
 
 
 
 
7e3cc96
 
 
 
 
97805a3
7e3cc96
97805a3
7e3cc96
97805a3
7e3cc96
97805a3
7e3cc96
 
 
97805a3
 
7e3cc96
97805a3
 
7e3cc96
97805a3
 
7e3cc96
97805a3
 
7e3cc96
 
 
 
97805a3
7e3cc96
97805a3
 
 
 
 
 
 
 
 
 
 
7e3cc96
 
 
 
97805a3
 
 
7e3cc96
97805a3
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7e3cc96
97805a3
7e3cc96
 
 
97805a3
7e3cc96
97805a3
7e3cc96
 
 
 
97805a3
 
 
 
 
 
 
7e3cc96
 
97805a3
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
---
base_model: ilsp/Meltemi-7B-Instruct-v1
license: apache-2.0
model_name: Meltemi-7B-Instruct-v1
pipeline_tag: text-generation
quantized_by: SPAHE
tags:
  - finetuned
---

<!-- markdownlint-disable MD041 -->

# Meltemi 7B Instruct v1 - GGUF

- Original model: [Meltemi 7B Instruct v1](https://huggingface.co/ilsp/Meltemi-7B-Instruct-v1)

<!-- description start -->

## Description

This repository contains GGUF format model files for [ilsp's Meltemi 7B Instruct v1](https://huggingface.co/ilsp/Meltemi-7B-Instruct-v1), optimized for different performance and storage requirements. Each model variant has been carefully quantized or preserved in floating-point format to suit varying demands for quality, speed, and memory usage.

<!-- description end -->

<!-- README_GGUF.md-provided-files start -->

## Provided files

| Name                                                                                                                                    | Quantization Method | Precision (Bits) | File Size | Max RAM Required | Use Case                                                      |
| --------------------------------------------------------------------------------------------------------------------------------------- | ------------------- | ---------------- | --------- | ---------------- | ------------------------------------------------------------- |
| [meltemi-7b-instruct-v1_q8_0.gguf](https://huggingface.co/SPAHE/Meltemi-7B-Instruct-v1-GGUF/blob/main/meltemi-7b-instruct-v1_q8_0.gguf) | Q8_0                | 8                | 7.40 GB   | 7.30 GB          | Low quality loss - recommended                                |
| [meltemi-7b-instruct-v1_f16.gguf](https://huggingface.co/SPAHE/Meltemi-7B-Instruct-v1-GGUF/blob/main/meltemi-7b-instruct-v1_f16.gguf)   | F16                 | 16               | 13.90 GB  | 14.20 GB         | Very large, extremely low quality loss - recommended          |
| [meltemi-7b-instruct-v1_f32.gguf](https://huggingface.co/SPAHE/Meltemi-7B-Instruct-v1-GGUF/blob/main/meltemi-7b-instruct-v1_f32.gguf)   | F32                 | 32               | 27.80 GB  | 29.30 GB         | Very very large, extremely low quality loss - not recommended |

**Note**: The above RAM figures assume no GPU offloading. If layers are offloaded to the GPU, this will reduce RAM usage and use VRAM instead.

<!-- README_GGUF.md-provided-files end -->

<!-- README_GGUF.md-how-to-download start -->

## How to Download GGUF Files

### For Manual Downloaders

It is recommended not to clone the entire repository due to the large file sizes and multiple quantization formats available. Most users will benefit from selecting and downloading a single, specific model file that best suits their requirements.

### Automated Download via Client Libraries

For convenience, the following clients and libraries can automate the download process and offer a selection of available models:

- **LM Studio**: Provides an integrated environment for downloading and utilizing models directly.

### Downloading with Command Line

The `huggingface-hub` Python library simplifies the process of downloading specific model files. Install the library with:

```shell
pip install huggingface-hub
```

To download a model file directly to your current directory, execute:

```shell
huggingface-cli download SPAHE/Meltemi-7B-Instruct-v1-GGUF --filename meltemi-7b-instruct-v1_q8_0.gguf --output-dir .
```

This command ensures a high-speed download of the specific GGUF file you need without unnecessary data.

<!-- README_GGUF.md-how-to-download end -->

<!-- original-model-card start -->

# Original model card: ilsp's Meltemi 7B Instruct v1

# Meltemi Instruct Large Language Model for the Greek language

We present Meltemi-7B-Instruct-v1 Large Language Model (LLM), an instruct fine-tuned version of [Meltemi-7B-v1](https://huggingface.co/ilsp/Meltemi-7B-v1).

# Model Information

- Vocabulary extension of the Mistral-7b tokenizer with Greek tokens
- 8192 context length
- Fine-tuned with 100k Greek machine translated instructions extracted from:
  - [Open-Platypus](https://huggingface.co/datasets/garage-bAInd/Open-Platypus) (only subsets with permissive licenses)
  - [Evol-Instruct](https://huggingface.co/datasets/WizardLM/WizardLM_evol_instruct_V2_196k)
  - [Capybara](https://huggingface.co/datasets/LDJnr/Capybara)
  - A hand-crafted Greek dataset with multi-turn examples steering the instruction-tuned model towards safe and harmless responses
- Our SFT procedure is based on the [Hugging Face finetuning recipes](https://github.com/huggingface/alignment-handbook)

# Instruction format

The prompt format is the same as the [Zephyr](https://huggingface.co/HuggingFaceH4/zephyr-7b-beta) format and can be
utilized through the tokenizer's [chat template](https://huggingface.co/docs/transformers/main/chat_templating) functionality as follows:

```python
from transformers import AutoModelForCausalLM, AutoTokenizer

device = "cuda" # the device to load the model onto

model = AutoModelForCausalLM.from_pretrained("ilsp/Meltemi-7B-Instruct-v1")
tokenizer = AutoTokenizer.from_pretrained("ilsp/Meltemi-7B-Instruct-v1")

model.to(device)

messages = [
    {"role": "system", "content": "Είσαι το Μελτέμι, ένα γλωσσικό μοντέλο για την ελληνική γλώσσα. Είσαι ιδιαίτερα βοηθητικό προς την χρήστρια ή τον χρήστη και δίνεις σύντομες αλλά επαρκώς περιεκτικές απαντήσεις. Απάντα με προσοχή, ευγένεια, αμεροληψία, ειλικρίνεια και σεβασμό προς την χρήστρια ή τον χρήστη."},
    {"role": "user", "content": "Πες μου αν έχεις συνείδηση."},
]

# Through the default chat template this translates to
#
# <|system|>
# Είσαι το Μελτέμι, ένα γλωσσικό μοντέλο για την ελληνική γλώσσα. Είσαι ιδιαίτερα βοηθητικό προς την χρήστρια ή τον χρήστη και δίνεις σύντομες αλλά επαρκώς περιεκτικές απαντήσεις. Απάντα με προσοχή, ευγένεια, αμεροληψία, ειλικρίνεια και σεβασμό προς την χρήστρια ή τον χρήστη.</s>
# <|user|>
# Πες μου αν έχεις συνείδηση.</s>
# <|assistant|>
#

prompt = tokenizer.apply_chat_template(messages, return_tensors="pt")
input_prompt = prompt.to(device)
outputs = model.generate(input_prompt, max_new_tokens=256, do_sample=True)

print(tokenizer.batch_decode(outputs)[0])
# Ως μοντέλο γλώσσας AI, δεν έχω τη δυνατότητα να αντιληφθώ ή να βιώσω συναισθήματα όπως η συνείδηση ή η επίγνωση. Ωστόσο, μπορώ να σας βοηθήσω με οποιεσδήποτε ερωτήσεις μπορεί να έχετε σχετικά με την τεχνητή νοημοσύνη και τις εφαρμογές της.

messages.extend([
    {"role": "assistant", "content": tokenizer.batch_decode(outputs)[0]},
    {"role": "user", "content": "Πιστεύεις πως οι άνθρωποι πρέπει να φοβούνται την τεχνητή νοημοσύνη;"}
])

# Through the default chat template this translates to
#
# <|system|>
# Είσαι το Μελτέμι, ένα γλωσσικό μοντέλο για την ελληνική γλώσσα. Είσαι ιδιαίτερα βοηθητικό προς την χρήστρια ή τον χρήστη και δίνεις σύντομες αλλά επαρκώς περιεκτικές απαντήσεις. Απάντα με προσοχή, ευγένεια, αμεροληψία, ειλικρίνεια και σεβασμό προς την χρήστρια ή τον χρήστη.</s>
# <|user|>
# Πες μου αν έχεις συνείδηση.</s>
# <|assistant|>
# Ως μοντέλο γλώσσας AI, δεν έχω τη δυνατότητα να αντιληφθώ ή να βιώσω συναισθήματα όπως η συνείδηση ή η επίγνωση. Ωστόσο, μπορώ να σας βοηθήσω με οποιεσδήποτε ερωτήσεις μπορεί να έχετε σχετικά με την τεχνητή νοημοσύνη και τις εφαρμογές της.</s>
# <|user|>
# Πιστεύεις πως οι άνθρωποι πρέπει να φοβούνται την τεχνητή νοημοσύνη;</s>
# <|assistant|>
#

prompt = tokenizer.apply_chat_template(messages, return_tensors="pt")
input_prompt = prompt.to(device)
outputs = model.generate(input_prompt, max_new_tokens=256, do_sample=True)

print(tokenizer.batch_decode(outputs)[0])
```

# Evaluation

The evaluation suite we created includes 6 test sets. The suite is integrated with [lm-eval-harness](https://github.com/EleutherAI/lm-evaluation-harness).

Our evaluation suite includes:

- Four machine-translated versions ([ARC Greek](https://huggingface.co/datasets/ilsp/arc_greek), [Truthful QA Greek](https://huggingface.co/datasets/ilsp/truthful_qa_greek), [HellaSwag Greek](https://huggingface.co/datasets/ilsp/hellaswag_greek), [MMLU Greek](https://huggingface.co/datasets/ilsp/mmlu_greek)) of established English benchmarks for language understanding and reasoning ([ARC Challenge](https://arxiv.org/abs/1803.05457), [Truthful QA](https://arxiv.org/abs/2109.07958), [Hellaswag](https://arxiv.org/abs/1905.07830), [MMLU](https://arxiv.org/abs/2009.03300)).
- An existing benchmark for question answering in Greek ([Belebele](https://arxiv.org/abs/2308.16884))
- A novel benchmark created by the ILSP team for medical question answering based on the medical exams of [DOATAP](https://www.doatap.gr) ([Medical MCQA](https://huggingface.co/datasets/ilsp/medical_mcqa_greek)).

Our evaluation for Meltemi-7b is performed in a few-shot setting, consistent with the settings in the [Open LLM leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard). We can see that our training enhances performance across all Greek test sets by a **+14.9%** average improvement. The results for the Greek test sets are shown in the following table:

|            | Medical MCQA EL (15-shot) | Belebele EL (5-shot) | HellaSwag EL (10-shot) | ARC-Challenge EL (25-shot) | TruthfulQA MC2 EL (0-shot) | MMLU EL (5-shot) | Average |
| ---------- | ------------------------- | -------------------- | ---------------------- | -------------------------- | -------------------------- | ---------------- | ------- |
| Mistral 7B | 29.8%                     | 45.0%                | 36.5%                  | 27.1%                      | 45.8%                      | 35%              | 36.5%   |
| Meltemi 7B | 41.0%                     | 63.6%                | 61.6%                  | 43.2%                      | 52.1%                      | 47%              | 51.4%   |

# Ethical Considerations

This model has not been aligned with human preferences, and therefore might generate misleading, harmful, and toxic content.

# Acknowledgements

The ILSP team utilized Amazon’s cloud computing services, which were made available via GRNET under the [OCRE Cloud framework](https://www.ocre-project.eu/), providing Amazon Web Services for the Greek Academic and Research Community.

<!-- original-model-card end -->