PyTorch
mistral
File size: 8,623 Bytes
40927c8
 
98e13cc
 
 
 
 
40927c8
 
e0110ac
 
2d86351
7c210e5
 
d8fd1db
e0110ac
e9cc961
c4eade6
e9cc961
 
 
 
a033654
d8fd1db
002a533
d26125e
e0110ac
0aa06b3
215ecb7
64674b4
d8ed2fe
 
40927c8
64674b4
 
54384f9
 
d8fd1db
002a533
d26125e
54384f9
0aa06b3
215ecb7
64674b4
693d098
 
 
64674b4
 
54384f9
 
4347362
d8fd1db
4347362
d26125e
4347362
 
 
c9b58fa
4347362
 
d8fd1db
215ecb7
d26125e
215ecb7
 
 
c9b58fa
ec897e6
 
d8fd1db
ec897e6
d8fd1db
ec897e6
568e672
 
 
 
 
 
ec897e6
568e672
 
 
 
 
 
 
 
 
 
 
 
09c1838
568e672
5698793
 
568e672
 
09c1838
568e672
 
 
 
 
 
 
 
 
 
 
 
7ec5904
 
568e672
 
 
 
 
 
 
 
 
 
 
 
 
 
57a1889
 
7ec5904
8a04019
 
 
 
 
ec897e6
 
568e672
 
 
57a1889
568e672
57a1889
568e672
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
57a1889
 
 
 
 
 
 
568e672
 
 
 
e384aa3
568e672
 
 
 
 
 
 
57a1889
 
 
 
ec8e71e
568e672
 
57a1889
 
 
 
 
 
 
568e672
 
57a1889
 
 
 
 
 
 
 
 
 
568e672
57a1889
 
 
28e4465
568e672
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
---
license: apache-2.0
extra_gated_prompt: "We will release in the nearly future."
extra_gated_fields:
  Name: text
  Company: text
  Title: text
---

# Model Card for MediaTek Research Breeze-7B-FC-v1_0

MediaTek Research Breeze-7B-FC (hereinafter referred to as Breeze-7B-FC) is an advanced language model developed by MediaTek Research, building on [Breeze-7B-Base](https://huggingface.co/MediaTek-Research/Breeze-7B-Base-v1_0). Breeze-7B-FC extends its predecessor by incorporating a key feature: function calling. These enhancements make Breeze-7B-FC more versatile and capable of handling a wider range of tasks efficiently.


## 🏆 Performance

| Models                                                                                     | #Parameters | Organization | License    | 🧰 Function Calling? | 💬 Instrustion Following? |
|--------------------------------------------------------------------------------------------|-------------|------------|------------|-------------------|----------|
| [Breeze-7B-Instruct-v1_0](https://huggingface.co/MediaTek-Research/Breeze-7B-Instruct-v1_0)| 7B          | MediaTek Research | Apache 2.0 | ❌  | ✅       |
| [**Breeze-7B-FC-v1_0**](https://huggingface.co/MediaTek-Research/Breeze-7B-FC-v1_0)        | 7B          | MediaTek Research | Apache 2.0 | ✅ | ✅      |
| [Gorilla-OpenFunctions-v2](https://huggingface.co/MediaTek-Research/Breeze-7B-FC-v1_0)     | 7B          | Gorilla LLM       | Apache 2.0 | ✅ | ❌       |
| [GPT-3.5-Turbo-0125](https://openai.com)                                                   |             | OpenAI            | Proprietary| ✅ | ✅      |

**Evaluate function calling on EN benchmark**

We evaluate the performance of function calling on English with benchmark [Berkeley function-calling leaderboard](https://gorilla.cs.berkeley.edu/blogs/8_berkeley_function_calling_leaderboard.html).

| Models                            | ↑ Overall | Irrelevance<br/>Detection | AST/<br/>Simple | AST/<br/>Multiple | AST/<br/>Parallel | AST/<br/>Parallel-Multiple  | Exec/<br/>Simple | Exec/<br/>Multiple | Exec/<br/>Parallel | Exec/<br/>Parallel-Multiple  | 
|-----------------------------------|----------|---------------------|------------|--------------|--------------|------------------------|--------------|---------------------|---------------------|-------------------------------|
| **Breeze-7B-FC-v1_0 (FC)**        | 86.89 |  76.25 | 90.00 | 93.00 | 84.00 | 84.00 | 100.00 | 92.00 | 88.00 | 77.50 |
| Gorilla-OpenFunctions-v2 (FC)     | 85.95 |  60.00 | 94.25 | 95.50 | 86.50 | 86.00 | 97.00 | 96.00 | 80.00 | 75.00 |
| GPT-3.5-Turbo-0125 (FC)           | 72.77 |  4.58  | 87.75 | 90.50 | 88.50 | 82.50 | 91.00 | 82.00 | 78.00 | 52.50 |



![](misc/radar_chart_en.png)

**Evaluate function calling on ZHTW benchmark**

We evaluate the performance of function calling on Traditional Chinese with benchmark [function-calling-leaderboard-for-zhtw](https://github.com/mtkresearch/function-calling-leaderboard-for-zhtw).

| Models                            | ↑ Overall | Irrelevance<br/>Detection | AST/<br/>Simple | AST/<br/>Multiple | AST/<br/>Parallel | AST/<br/>Parallel-Multiple  | Exec/<br/>Simple | Exec/<br/>Multiple | Exec/<br/>Parallel | Exec/<br/>Parallel-Multiple  | 
|-----------------------------------|----------|---------------------|------------|--------------|--------------|------------------------|--------------|---------------------|---------------------|-------------------------------|
| **Breeze-7B-FC-v1_0 (FC)**        | 78.18 |  72.50 | 82.00 |	86.00 | 76.50|67.00|88.00|88.00|80.00|60.00|
| Gorilla-OpenFunctions-v2 (FC)     | 75.68 |  53.75 | 84.75 |	86.50 |	72.50 |	68.00 |	92.00 |	92.00 |	62.00 |	72.50 |
| GPT-3.5-Turbo-0125 (FC)           | 66.15 |  7.50  | 83.75 |	83.50 |	73.00 |	65.50 |	88.00 |	84.00 |	72.00 |	40.00 |



![](misc/radar_chart_zhtw.png)


 **Evaluate instrustion following on EN benchmark**

We evaluate the performance of instruction following on English with benchmark [MT-Bench](https://github.com/lm-sys/FastChat/blob/main/fastchat/llm_judge/README.md).

| | Win | Tie | Lose |
|---|---|---|---|
| **Breeze-7B-FC-v1_0** *v.s.* Breeze-7B-Instruct-v1_0 | 29 (18.1%) | 55 (34.3%) | 76 (47.5%) |


**Evaluate instrustion following on ZHTW benchmark**

We evaluate the performance of instruction following on Traditional Chinese with benchmark [MT-Bench-TC](https://github.com/mtkresearch/TCEval).

| | Win | Tie | Lose |
|---|---|---|---|
| **Breeze-7B-FC-v1_0** *v.s.* Breeze-7B-Instruct-v1_0 | 35 (21.9%) | 73 (45.6%) | 52 (32.5%) |


## 👩‍💻 How to use

**Dependiency**

Install `mtkresearch` package

```
git clone https://github.com/mtkresearch/mtkresearch.git
cd mtkresearch
pip install -e .
```

**Hosting by VLLM**

```python
from vllm import LLM, SamplingParams

llm = LLM(
    model='MediaTek-Research/Breeze-7B-FC-v1_0',
    tensor_parallel_size=num_gpu, # number of gpus
    gpu_memory_utilization=0.7
)

turn_end_token_id = 61876 # <|im_end|>
params = SamplingParams(
    temperature=0.01,
    top_p=0.01,
    max_tokens=4096,
    repetition_penalty=1.1,
    stop_token_ids=[turn_end_token_id]
)

def _inference(prompt, llm, params):
    return llm.generate(prompt, params)[0].outputs[0].text

```

**Instruction following**

```python
from mtkresearch.llm.prompt import MRPromptV2

sys_prompt = ('You are a helpful AI assistant built by MediaTek Research. '
  'The user you are helping speaks Traditional Chinese and comes from Taiwan.')

prompt_engine = MRPromptV2()

conversations = [
    {"role": "system", "content": sys_prompt},
    {"role": "user", "content": "請問什麼是深度學習?"},
]

prompt = prompt_engine.get_prompt(conversations)


output_str = _inference(prompt, llm, params)
result = prompt_engine.parse_generated_str(output_str)

print(result)
# {'role': 'assistant',
#  'content': '深度學習(Deep Learning)是一種機器學習方法,它模仿人類大腦的神經網路結構來
#              處理複雜的數據和任務。在深度學習中,模型由多層人工神經元組成,每個神經元之間有
#              權重連接,並通過非線性轉換進行計算。這些層與層之間的相互作用使模型能夠學習複雜
#              的函數關係或模式,從而解決各種問題,如圖像識別、自然語言理解、語音辨識等。深度
#              學習通常需要大量的數據和強大的計算能力,因此經常使用圖形處理器(GPU)或特殊的
#              加速器來執行。'}
```

**Function Calling**

```python
import json

from mtkresearch.llm.prompt import MRPromptV2

functions = [
    {
      "name": "get_current_weather",
      "description": "Get the current weather in a given location",
      "parameters": {
        "type": "object",
        "properties": {
          "location": {
            "type": "string",
            "description": "The city and state, e.g. San Francisco, CA"
          },
          "unit": {
            "type": "string",
            "enum": ["celsius", "fahrenheit"]
          }
        },
        "required": ["location"]
      }
    }
]

def faked_get_current_weather(location, unit=None):
    return {'temperature': 30}

mapping = {
    'get_current_weather': faked_get_current_weather
}

prompt_engine = MRPromptV2()

# stage 1: query
conversations = [
    {"role": "user", "content": "請問台北目前溫度是攝氏幾度?"},
]

prompt = prompt_engine.get_prompt(conversations, functions=functions)

output_str = _inference(prompt, llm, params)
result = prompt_engine.parse_generated_str(output_str)

print(result) 
# {'role': 'assistant', 
#  'tool_calls': [
#    {'id': 'call_U9bYCBRAbF639uUqfwehwSbw', 'type': 'function', 
#     'function': {'name': 'get_current_weather', 'arguments': '{"location": "台北, 台灣", "unit": "celsius"}'}}]}

# stage 2: execute called functions
conversations.append(result)

tool_call = result['tool_calls'][0]
func_name = tool_call['function']['name']
func = mapping[func_name]
arguments = json.loads(tool_call['function']['arguments'])
called_result = func(**arguments)

# stage 3: put executed results
conversations.append(
    {
        'role': 'tool',
        'tool_call_id': tool_call['id'],
        'name': func_name,
        'content': json.dumps(called_result)
    }
)

prompt = prompt_engine.get_prompt(conversations, functions=functions)

output_str2 = _inference(prompt, llm, params)
result2 = prompt_engine.parse_generated_str(output_str2)
print(result2)
# {'role': 'assistant', 'content': '台北目前的溫度是攝氏30度'}
```