--- pipeline_tag: text-generation --- # Model Card for Breeze-7B-Instruct-v0.1 Breeze-7B-Instruct-v0.1 is a 7-billion-parameter language model built from Mistral-7B and tailored for Traditional Chinese (TC). This model expands the TC vocabulary (extra 30k TC tokens) based on the original Mistral-7B to better adapt to TC and improve inference speed, resulting in a doubling of the original tokenizer's inference speed. To the best of our knowledge, this is the first work on vocabulary expansion in TC. This model uses 250GB of TC data for continued pre-training and uses over 1M instances for further supervised fine-tuning. Breeze-7B-Instruct-v0.1 performs well on both EN and TC benchmarks. This model outperforms Taiwan-LLM-7B-v2.1-chat, Taiwan-LLM-13B-v2.0-chat, and Yi-6B-Chat on all TC benchmarks and is comparable with Mistral-7B-Instruct-v0.1 on MMLU and MT-Bench in English. *A project by the members (in alphabetical order): Chan-Jan Hsu 許湛然, Chang-Le Liu 劉昶樂, Feng-Ting Liao 廖峰挺, Po-Chun Hsu 許博竣, Yi-Chang Chen 陳宜昌, and the supervisor Da-Shan Shiu 許大山.* ## Features - Expanding the vocabulary dictionary for Traditional Chinese from 32k to 62k vocabulary size - Multi-turn dialogue (without special handling for harmfulness) - 8k context length ## Model Details - **Finetuned from:** [MediaTek-Research/Breeze-7B-Base-v0.1](https://huggingface.co/MediaTek-Research/Breeze-7B-Base-v0.1) - **Model type:** Causal decoder-only transformer language model - **Language:** English and Traditional Chinese (zh-tw) ## Base Model Performance | Models | | TMMLU+ (ACC) | DRCD (EM) | Table (ACC) | MMLU (ACC) | |----------------------------------------------|--------|--------------|-------------|-------------|------------| | | |TC, Knowledge |TC, Reasoning|TC, Reasoning|EN, Knowledge| | | | 5 shot | 3 shot | 5 shot | 5 shot | | [Yi-34B](https://huggingface.co/01-ai/Yi-34B)| 34B | 63.10 | 84.57 | 49.31 | 77.42 | | [Qwen-14B](https://huggingface.co/01-ai/Qwen/Qwen-14B)| 14B | 51.30 | 16.95 * | 50.69 | 68.83 | | [Yi-6B](https://huggingface.co/01-ai/Yi-6B) | 6B | 49.63 | 76.61 | 34.72 | 65.35 | | [Qwen-7B](https://huggingface.co/01-ai/Qwen/Qwen-7B)| 7B | 42.84 | 0.0 * | 39.58 | 61.00 | | [**Breeze-7B-Base-v0.1**](https://huggingface.co/MediaTek-Research/Breeze-7B-Base-v0.1) | 7B | 40.35 | 81.13 | 28.47 | 61.63 | | [Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1)| 7B | 36.93 | 79.27 | 27.78 | 64.89 | \* Few-shot learning cannot effectively guide the model to generate the proper answer. | Category ACC of TMMLU+ (5 shot) | STEM | Social Science | Humanities | Other | |-----------------------------------------------------|--------------|----------------|------------|------------| | Yi-34B | 56.03 | 73.06 | 61.12 | 62.19 | | Qwen-14B | 46.51 | 58.20 | 51.12 | 49.38 | | Yi-6B | 41.14 | 57.77 | 50.22 | 49.39 | | Qwen-7B | 28.25 | 47.80 | 43.14 | 42.17 | | **Breeze-7B-Base-v0.1** | 35.74 | 46.08 | 40.29 | 39.27 | | Mistral-7B-v0.1 | 33.01 | 42.23 | 35.86 | 37.63 | ## Chat Model Performance | Models | | TMMLU+ (ACC) | TMMLU+ (ACC) | DRCD (EM) | Table (ACC) | MT-Bench-tw (Score) | MMLU (ACC) | MMLU (ACC) | MT-Bench (Score) | |--------------------------------------------|--------|--------------|--------------|-----------|-------------|--------|------------|------------|------------------| | | |TC, Knowledge |TC, Knowledge |TC, Reasoning|TC, Reasoning|TC, Chat |EN, Knowledge|EN, Knowledge|EN, Chat | | | | 0 shot | 5 shot | 3 shot | 0 shot | 0 shot | 0 shot | 5 shot | 0 shot | | [Yi-34B-Chat](https://huggingface.co/01-ai/Yi-34B-Chat) | 34B | 54.87 | | | 36.81 | 6.9 | 71.04 | | 7.6 | | [Qwen-14B-Chat](https://huggingface.co/Qwen/Qwen-14B-Chat) | 14B | 48.41 | | | 41.67 | 6.4 | 64.91 | | 7.2 | | [Yi-6B-Chat](https://huggingface.co/01-ai/Yi-6B-Chat) | 6B | 44.79 | | | 25.69 | 5.0 | 59.45 | | 6.0 | | [gpt-3.5-turbo](https://openai.com) | | 41.76 | | | | 7.1 | 70.00 | | 7.9 | | [**Breeze-7B-Instruct-v0.1**](https://huggingface.co/MediaTek-Research/Breeze-7B-Instruct-v0.1) | 7B | 41.61 | | | 45.83 | 5.7 | 63.26 | | 7.1 | | [**Breeze-7B-Instruct-64k-v0.1**](https://huggingface.co/MediaTek-Research/Breeze-7B-Instruct-64k-v0.1) | 7B | 40.99 | | | 36.11 | 5.5 | 63.68 | | 7.1 | | [Qwen-7B-Chat](https://huggingface.co/Qwen/Qwen-7B-Chat) | 7B | 40.02 | | | 33.33 | 5.4 | 55.94 | | 6.2 | | [Taiwan-LLM-13B-v2.0-chat](https://huggingface.co/yentinglin/Taiwan-LLM-13B-v2.0-chat) | 13B | 29.47 | | | 23.61 | 5.0 | 50.50 | | -* | | [Taiwan-LLM-7B-v2.1-chat](https://huggingface.co/yentinglin/Taiwan-LLM-7B-v2.1-chat) | 7B | 28.08 | | | 31.25 | 4.2 | 42.72 | | -* | \* Taiwan-LLM models responds to multi-turn questions (English) in Traditional Chinese. | Category ACC of TMMLU+ (0 shot) | STEM | Social Science | Humanities | Other | |-----------------------------------------------------|--------------|----------------|------------|------------| | Yi-34B-Chat | 47.65 | 64.25 | 52.73 | 54.91 | | Qwen-14B-Chat | 43.83 | 55.00 | 48.55 | 46.22 | | Yi-6B-Chat | 37.80 | 51.74 | 45.36 | 44.25 | | gpt-3.5-turbo | 41.56 | 46.72 | 36.73 | 42.03 | | **Breeze-7B-Instruct-v0.1** | 37.41 | 46.81 | 42.06 | 40.16 | | **Breeze-7B-Instruct-64k-v0.1** | 37.88 | 46.35 | 40.31 | 39.40 | | Qwen-7B-Chat | 35.44 | 46.22 | 38.35 | 40.06 | | Taiwan-LLM-13B-v2.0-chat | 27.74 | 33.69 | 27.03 | 29.43 | | Taiwan-LLM-7B-v2.1-chat | 25.58 | 31.76 | 27.36 | 27.61 | ## Inference Performance In this test, we use the first 700 characters of the [web article](https://health.udn.com/health/story/5976/7699252?from=udn_ch1005_main_index) as the input and ask the model to write the same article again. All inferences run on 2 RTX A6000 GPUs (using `vllm`, with a tensor-parallel size of 2). | Models | Inference Time (sec)|Estimated Max Input Length (Char)| |--------------------------------------------------------------------|-------------------|--------------------------| | Yi-6B | 10.62 | 5.2k | | **Breeze-7B-Instruct-v0.1** | 10.74 | 11.1k | | **Breeze-7B-Instruct-64k-v0.1** | 10.74 | 88.8k | | Qwen-7B | 10.86 | 9.8k | | Qwen-14B | 18.89 | 9.8k | | Mistral-7B-v0.1 | 20.48 | 5.1k | | Taiwan-LLM-7B-v2.1-base | 26.26 | 2.2k | | Taiwan-LLM-13B-v2.0-base | 36.80 | 2.2k | | Yi-34B | 43.71 | 4.5k | ## Examples ## Use in Transformers First install direct dependencies: ``` pip install transformers torch accelerate ``` If you want faster inference using flash-attention2, you need to install these dependencies: ```bash pip install packaging ninja pip install flash-attn ``` Then load the model in transformers: ```python from transformers import AutoModelForCausalLM, AutoTokenizer import torch model = AutoModelForCausalLM.from_pretrained( model="MediaTek-Research/Breeze-7B-Instruct-v0.1", device_map="auto", torch_dtype=torch.bfloat16, use_flash_attn_2=True # optional ) ``` The structure of the query template follows that of Mistral-7B-Instruct, as shown below. ```txt SYS_PROMPT [INST] QUERY1 [/INST] RESPONSE1 [INST] QUERY2 [/INST] ``` where `SYS_PROMPT`, `QUERY1`, `RESPONSE1`, and `QUERY2` can be provided by the user. The suggested default `SYS_PROMPT` is ```txt You are a helpful AI assistant built by MediaTek Research. The user you are helping speaks Traditional Chinese and comes from Taiwan. ```