yentinglin commited on
Commit
47c4573
1 Parent(s): fa10053

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +257 -128
README.md CHANGED
@@ -1,199 +1,328 @@
1
  ---
 
2
  library_name: transformers
3
- tags: []
4
- ---
5
-
6
- # Model Card for Model ID
7
-
8
- <!-- Provide a quick summary of what the model is/does. -->
9
-
10
-
11
-
12
- ## Model Details
13
-
14
- ### Model Description
15
-
16
- <!-- Provide a longer summary of what this model is. -->
17
-
18
- This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
19
-
20
- - **Developed by:** [More Information Needed]
21
- - **Funded by [optional]:** [More Information Needed]
22
- - **Shared by [optional]:** [More Information Needed]
23
- - **Model type:** [More Information Needed]
24
- - **Language(s) (NLP):** [More Information Needed]
25
- - **License:** [More Information Needed]
26
- - **Finetuned from model [optional]:** [More Information Needed]
27
-
28
- ### Model Sources [optional]
29
-
30
- <!-- Provide the basic links for the model. -->
31
-
32
- - **Repository:** [More Information Needed]
33
- - **Paper [optional]:** [More Information Needed]
34
- - **Demo [optional]:** [More Information Needed]
35
-
36
- ## Uses
37
 
38
- <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
39
-
40
- ### Direct Use
41
-
42
- <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
43
-
44
- [More Information Needed]
45
-
46
- ### Downstream Use [optional]
47
-
48
- <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
49
-
50
- [More Information Needed]
51
-
52
- ### Out-of-Scope Use
53
-
54
- <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
55
-
56
- [More Information Needed]
57
-
58
- ## Bias, Risks, and Limitations
59
-
60
- <!-- This section is meant to convey both technical and sociotechnical limitations. -->
61
-
62
- [More Information Needed]
63
 
64
- ### Recommendations
65
 
66
- <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
67
 
68
- Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
69
 
70
- ## How to Get Started with the Model
71
 
72
- Use the code below to get started with the model.
73
 
74
- [More Information Needed]
75
 
76
- ## Training Details
77
 
78
- ### Training Data
79
 
80
- <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
 
 
 
 
81
 
82
- [More Information Needed]
83
 
84
- ### Training Procedure
 
 
 
 
 
85
 
86
- <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
87
 
88
- #### Preprocessing [optional]
89
 
90
- [More Information Needed]
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
91
 
 
92
 
93
- #### Training Hyperparameters
94
 
95
- - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
96
 
97
- #### Speeds, Sizes, Times [optional]
98
 
99
- <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
100
 
101
- [More Information Needed]
102
 
103
- ## Evaluation
104
 
105
- <!-- This section describes the evaluation protocols and provides the results. -->
106
 
107
- ### Testing Data, Factors & Metrics
 
 
 
 
 
 
 
 
108
 
109
- #### Testing Data
110
 
111
- <!-- This should link to a Dataset Card if possible. -->
112
 
113
- [More Information Needed]
114
 
115
- #### Factors
116
 
117
- <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
118
 
119
- [More Information Needed]
 
120
 
121
- #### Metrics
 
122
 
123
- <!-- These are the evaluation metrics being used, ideally with a description of why. -->
124
 
125
- [More Information Needed]
126
 
127
- ### Results
128
 
129
- [More Information Needed]
 
130
 
131
- #### Summary
132
 
 
 
133
 
 
134
 
135
- ## Model Examination [optional]
136
 
137
- <!-- Relevant interpretability work for the model goes here -->
138
 
139
- [More Information Needed]
140
 
141
- ## Environmental Impact
142
 
143
- <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
 
144
 
145
- Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
146
 
147
- - **Hardware Type:** [More Information Needed]
148
- - **Hours used:** [More Information Needed]
149
- - **Cloud Provider:** [More Information Needed]
150
- - **Compute Region:** [More Information Needed]
151
- - **Carbon Emitted:** [More Information Needed]
152
 
153
- ## Technical Specifications [optional]
154
 
155
- ### Model Architecture and Objective
 
156
 
157
- [More Information Needed]
158
 
159
- ### Compute Infrastructure
160
 
161
- [More Information Needed]
162
 
163
- #### Hardware
164
 
165
- [More Information Needed]
 
166
 
167
- #### Software
 
 
 
 
 
168
 
169
- [More Information Needed]
 
 
 
 
170
 
171
- ## Citation [optional]
 
 
 
172
 
173
- <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
 
174
 
175
- **BibTeX:**
176
 
177
- [More Information Needed]
178
 
179
- **APA:**
180
 
181
- [More Information Needed]
182
 
183
- ## Glossary [optional]
184
 
185
- <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
186
 
187
- [More Information Needed]
 
188
 
189
- ## More Information [optional]
190
 
191
- [More Information Needed]
 
 
192
 
193
- ## Model Card Authors [optional]
 
 
 
194
 
195
- [More Information Needed]
 
 
196
 
197
- ## Model Card Contact
 
 
198
 
199
- [More Information Needed]
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ license: llama3
3
  library_name: transformers
4
+ pipeline_tag: text-generation
5
+ base_model: meta-llama/Meta-Llama-3-70B
6
+ language:
7
+ - zh
8
+ - en
9
+ tags:
10
+ - zhtw
11
+ widget:
12
+ - text: >-
13
+ A chat between a curious user and an artificial intelligence assistant.
14
+ The assistant gives helpful, detailed, and polite answers to the user's
15
+ questions. USER: 你好,請問你可以幫我寫一封推薦信嗎? ASSISTANT:
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
16
 
17
+ ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
18
 
19
+ <img src="https://cdn-uploads.huggingface.co/production/uploads/5df9c78eda6d0311fd3d541f/vlfv5sHbt4hBxb3YwULlU.png" alt="Taiwan LLM Logo" width="600" style="margin-left:'auto' margin-right:'auto' display:'block'"/>
20
 
21
+ 🚀 We're excited to introduce Llama-3-Taiwan-70B! Llama-3-Taiwan-70B is a 70B parameter model finetuned on a large corpus of Traditional Mandarin and English data using the Llama-3 architecture. It demonstrates state-of-the-art performance on various Traditional Mandarin NLP benchmarks.
22
 
23
+ The model was trained with [NVIDIA NeMo™ Framework](https://www.nvidia.com/en-us/ai-data-science/generative-ai/nemo-framework/) using the NVIDIA Taipei-1 built with [NVIDIA DGX H100](https://www.nvidia.com/en-us/data-center/dgx-h100/) systems.
24
 
25
+ The compute and data for training Llama-3-Taiwan-70B was generously sponsored by [Chang Gung Memorial Hospital](https://www.cgmh.org.tw/eng), [Chang Chun Group](https://www.ccp.com.tw/ccpweb.nsf/homepage?openagent), [Legalsign.ai](https://legalsign.ai/), [NVIDIA](https://www.nvidia.com/zh-tw/), [Pegatron](https://www.pegatroncorp.com/), [TechOrange](https://buzzorange.com/techorange/), and [Unimicron](https://www.unimicron.com/) (in alphabetical order).
26
 
27
+ We would like to acknowledge the [contributions](https://huggingface.co/yentinglin/Llama-3-Taiwan-70B-Instruct-rc1#contributions) of our data provider, team members and advisors in the development of this model, including [shasha77](https://www.youtube.com/@shasha77) for high-quality YouTube scripts and study materials, [Taiwan AI Labs](https://ailabs.tw/) for providing local media content, [Ubitus K.K.](https://ubitus.net/zh/) for offering gaming content, Professor Yun-Nung (Vivian) Chen for her guidance and advisement, Wei-Lin Chen for leading our pretraining data pipeline, Tzu-Han Lin for synthetic data generation, Chang-Sheng Kao for enhancing our synthetic data quality, and Kang-Chieh Chen for cleaning instruction-following data.
28
 
 
29
 
30
+ # Model Summary
31
 
32
+ Llama-3-Taiwan-70B is a large language model finetuned for Traditional Mandarin and English users. It has strong capabilities in language understanding, generation, reasoning, and multi-turn dialogue. Key features include:
33
 
34
+ - 70B parameters
35
+ - Languages: Traditional Mandarin (zh-tw), English (en)
36
+ - Finetuned on High-quality Traditional Mandarin and English corpus covering general knowledge as well as industrial knowledge in legal, manufacturing, medical, and electronics domains
37
+ - 8K context length
38
+ - Open model released under the Llama-3 license
39
 
40
+ # Training Details
41
 
42
+ - Training Framework: [NVIDIA NeMo](https://www.nvidia.com/zh-tw/ai-data-science/products/nemo/), [NVIDIA NeMo Megatron](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/nlp/megatron.html)
43
+ - Inference Framework: [NVIDIA TensorRT-LLM](https://github.com/NVIDIA/TensorRT-LLM)
44
+ - Base model: [Llama-3 70B](https://llama.meta.com/llama3/)
45
+ - Hardware: [NVIDIA DGX H100](https://www.nvidia.com/zh-tw/data-center/dgx-h100/) on Taipei-1
46
+ - Context length: 8K tokens ([128k version](https://huggingface.co/yentinglin/Llama-3-Taiwan-70B-Instruct-128k))
47
+ - Batch size: 2M tokens per step
48
 
49
+ # Evaluation
50
 
51
+ Checkout [Open TW LLM Leaderboard](https://huggingface.co/spaces/yentinglin/open-tw-llm-leaderboard) for full and updated list.
52
 
53
+ | Model | [TMLU](https://arxiv.org/pdf/2403.20180) | Taiwan Truthful QA | [Legal Eval](https://huggingface.co/datasets/lianghsun/tw-legal-benchmark-v1) | [TW MT-Bench](https://huggingface.co/datasets/MediaTek-Research/TCEval-v2) | Long context | Function Calling | [TMMLU+](https://github.com/iKala/ievals) |
54
+ |---------------------------------------------------------------------------------|--------------|---------------|--------------------|--------------|--------------|-----------------|-----------|
55
+ | | 學科知識 | 台灣在地化測試 | 台灣法律考題 | 中文多輪對答 | 長文本支援 | 函數呼叫 | |
56
+ | [**yentinglin/Llama-3-Taiwan-70B-Instruct**](https://huggingface.co/yentinglin/Llama-3-Taiwan-70B-Instruct) | **74.76%** | 80.95% | 68.42% | 7.54 | [128k version](https://huggingface.co/yentinglin/Llama-3-Taiwan-70B-Instruct-128k) | ✅ | 67.53% |
57
+ | [**yentinglin/Llama-3-Taiwan-70B-Instruct-DPO**](https://huggingface.co/yentinglin/Llama-3-Taiwan-70B-Instruct-DPO) | 74.60% | **81.75%** | **70.33%** | - | - | ✅ | - |
58
+ | [**yentinglin/Llama-3-Taiwan-70B-Instruct-128k**](https://huggingface.co/yentinglin/Llama-3-Taiwan-70B-Instruct) | 73.01% | 80.16% | 63.64% | - | - | ✅ | - |
59
+ | [**yentinglin/Llama-3-Taiwan-8B-Instruct**](https://huggingface.co/yentinglin/Llama-3-Taiwan-8B-Instruct) | 59.50% | 61.11% | 53.11% | 7.21 | [128k version](https://huggingface.co/yentinglin/Llama-3-Taiwan-8B-Instruct-128k) | ✅ | 52.28% |
60
+ | [**yentinglin/Llama-3-Taiwan-8B-Instruct-DPO**](https://huggingface.co/yentinglin/Llama-3-Taiwan-8B-Instruct-DPO) | 59.88% | 59.52% | 52.63% | - | - | ✅ | - |
61
+ | [**yentinglin/Llama-3-Taiwan-8B-Instruct-128k**](https://huggingface.co/yentinglin/Llama-3-Taiwan-8B-Instruct-128k) | - | - | - | - | - | ✅ | - |
62
+ | [Claude-3-Opus](https://www.anthropic.com/api) | [73.59% (5-shot)](https://arxiv.org/pdf/2403.20180) | [69.84%](https://huggingface.co/yentinglin/Llama-3-Taiwan-70B-Instruct-rc3/tree/main/opus-Taiwan-Truthful-QA) | [60.29%](https://huggingface.co/yentinglin/Llama-3-Taiwan-70B-Instruct-rc3/tree/main/opus) | - | 200k | ✅ | - |
63
+ | [GPT4-o](https://platform.openai.com/docs/api-reference/chat/create) | [65.56% (0-shot), 69.88% (5-shot)](https://huggingface.co/yentinglin/Llama-3-Taiwan-70B-Instruct-rc3/tree/main/4o-tmlu) | [76.98%](https://huggingface.co/yentinglin/Llama-3-Taiwan-70B-Instruct-rc3/tree/main/4o-Taiwan-Truthful-QA) | [53.59%](https://huggingface.co/yentinglin/Llama-3-Taiwan-70B-Instruct-rc3/tree/main/4o) | - | 128k | ✅ | - |
64
+ | [GPT4-turbo](https://platform.openai.com/docs/api-reference/chat/create) | [70.42% (5-shot)](https://arxiv.org/pdf/2403.20180) | - | - | - | 128k | ✅ | 60.34%^ |
65
+ | [Gemini-Pro](https://ai.google.dev/gemini-api/docs) | [61.40% (5-shot)](https://arxiv.org/pdf/2403.20180) | - | - | - | 1000k | ✅ | 49.92%^ |
66
+ | [GPT-3.5-turbo-1106](https://platform.openai.com/docs/api-reference/chat/create) | [49.37% (5-shot)](https://arxiv.org/pdf/2403.20180) | - | - | 7.1 | 128k | ✅ | 41.76%^ |
67
+ | [Qwen1.5-110B-Chat](https://huggingface.co/Qwen/Qwen1.5-110B-Chat) | **75.69%** | 66.67% | 49.28% | - | 32k | ✅ | 65.81% |
68
+ | [Yi-34B-Chat](https://huggingface.co/01-ai/Yi-34B-Chat) | 73.59% | 71.43% | 55.02% | 6.9 | 200k | ✅ | 64.10% |
69
+ | [Meta-Llama-3-70B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-70B-Instruct) | 70.95% | 65.08% | 52.63% | - | 8k | ✅ | 62.75% |
70
+ | [Mixtral-8x22B-Instruct-v0.1](https://huggingface.co/mistralai/Mixtral-8x22B-Instruct-v0.1) | 55.57% | 52.38% | 44.98% | - | 64k | ✅ | 52.16% |
71
+ | [Breexe-8x7B-Instruct-v0_1](https://huggingface.co/MediaTek-Research/Breexe-8x7B-Instruct-v0_1) | - | - | - | 7.2 | 8k | ❓ | 48.92% |
72
+ | [c4ai-command-r-plus](https://huggingface.co/CohereForAI/c4ai-command-r-plus) | 62.87% | 64.29% | 34.45% | - | 128k | ✅ | 49.75% |
73
+ | [Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) | 55.81% | 46.83% | 35.89% | - | 8k | ✅ | 43.38% |
74
+ | [Breeze-7B-Instruct-v1_0](https://huggingface.co/MediaTek-Research/Breeze-7B-Instruct-v1_0) | 55.57% | 52.38% | 39.23% | 6.0 | 32k | ❓ | 41.77% |
75
+ | [Llama3-TAIDE-LX-8B-Chat-Alpha1](https://huggingface.co/taide/Llama3-TAIDE-LX-8B-Chat-Alpha1) | 47.30% | 50.79% | 37.80% | - | 8k | ❓ | 39.03% |
76
+ | [Phi-3-mini-4k-instruct](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct) | 40.97% | 37.30% | 27.27% | - | 4k | ❓ | 33.02% |
77
 
78
+ Numbers are 0-shot by default.
79
 
80
+ [Eval implementation](https://github.com/adamlin120/lm-evaluation-harness)
81
 
82
+ ^ taken the closet matching numbers from original dataset.
83
 
84
+ ## Needle in a Haystack Evaluation
85
 
86
+ The "Needle in a 出師表" evaluation tests the model's ability to locate and recall important information embedded within a large body of text, using the classic Chinese text 《出師表》 by 諸葛亮.
87
 
88
+ To run the evaluation, use the [script](https://github.com/adamlin120/needle-haystack/tree/main).
89
 
 
90
 
91
+ # TW MT-Bench Score
92
 
93
+ - Average Score: 7.5375
94
+ - Maximum Score: 10
95
+ - Minimum Score: 1
96
+ - Median Score: 9.0
97
+ - Standard Deviation: 3.0349783771882133
98
+ - Total Number of Scores: 160
99
+ - [Model resopnse](https://huggingface.co/yentinglin/Llama-3-Taiwan-70B-Instruct-rc1/blob/main/Llama-3-Taiwan.jsonl)
100
+ - [GPT-4 Eval](https://huggingface.co/yentinglin/Llama-3-Taiwan-70B-Instruct-rc1/blob/main/gpt-4_single.jsonl)
101
+ - [Code fork from `mtkresearch/TCEval` with bug fixing](https://github.com/adamlin120/TCEval)
102
 
 
103
 
104
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/5df9c78eda6d0311fd3d541f/hfstnJElgB39Um4Bco11D.png)
105
 
 
106
 
107
+ # Use Cases
108
 
109
+ Llama-3-Taiwan-70B can be applied to a wide variety of NLP tasks in Traditional Chinese and English, including:
110
 
111
+ <details>
112
+ <summary>1. 多輪對話</summary>
113
 
114
+ <pre>
115
+ System: You are an AI assistant called Twllm, created by TAME (TAiwan Mixture of Expert) project.
116
 
117
+ User: 嗨,你好!
118
 
119
+ Assistant: ��好!我今天可以如何協助你?
120
 
121
+ User: 就想聊天而已....
122
 
123
+ Assistant: 當然,我很樂意和你聊天!有什麼特別想談論的嗎?
124
+ </pre>
125
 
126
+ </details>
127
 
128
+ <details>
129
+ <summary>2. RAG(檢索增強生成)</summary>
130
 
131
+ Demo: 可以打開 `Search Web` on [twllm.com](https://twllm.com/)
132
 
 
133
 
134
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/5df9c78eda6d0311fd3d541f/Zcl20NsPmY2vObIaMtT8G.png)
135
 
 
136
 
137
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/5df9c78eda6d0311fd3d541f/xuXEHIkMgDEB7xxDd3JoL.png)
138
 
139
+ <pre>
140
+ System: You are an AI assistant called Twllm, created by TAME (TAiwan Mixture of Expert) project.
141
 
142
+ User: 明天台大天氣如何?
143
 
144
+ Assistant: 根據天氣預報,明天台大天氣為陣雨。低溫 24°C。10 15 公里/小時 的 東 風 。降雨機率為 40%。
145
+ </pre>
 
 
 
146
 
147
+ </details>
148
 
149
+ <details>
150
+ <summary>3. 格式化輸出, 語言理解, 實體識別, 函數調用</summary>
151
 
152
+ If you are interested in function-calling, I strongly recommend [using constrained decoding](https://huggingface.co/docs/text-generation-inference/basic_tutorials/using_guidance) to turn on `json mode`.
153
 
154
+ Example from HW7 in `INTRODUCTION TO GENERATIVE AI 2024 SPRING` from `HUNG-YI LEE (李宏毅)`
155
 
 
156
 
157
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/5df9c78eda6d0311fd3d541f/dCGo92llL3_ammH_Nu2-c.png)
158
 
159
+ <pre>
160
+ System: You are an AI assistant called Twllm, created by TAME (TAiwan Mixture of Expert) project.
161
 
162
+ User: 您是電影評論分析師,負責情緒分析。
163
+ 分析每個被評論的句子,評估句子中每個字和標點符號的重要性,
164
+ 並且將結果使用 tuple 的結構列出。
165
+ tuple 值的範圍介於 -1 (負向) 到 1 (正向) 之間,
166
+ 然後,將評論分類為正向 (1) 或負向 (-1)。
167
+ 這些需要被評論的句子,包含在 <review> 標籤中間。
168
 
169
+ 輸出範例:
170
+ [(<單字或標點符號>, <float importance>), ...]
171
+ <int classification>
172
+ <review> 根據外媒報導,宏都拉斯在去年3月與台灣斷交,並轉往與中國建交之後,卻遲遲等不到中國的金援,談好要進口宏都拉斯的白蝦也拖延許久,甚至還將白蝦價格殺價至過往與台灣交易的半價,再加上墨西哥同時也中斷了與宏國的白蝦貿易,對當地蝦農可說是雪上加霜。 <review>
173
+
174
 
175
+ Assistant: [('根據', 0.2), ('外媒', 0.1), ('報導', -0.1), (',', 0.05), ('宏都拉斯', -0.3), ('在去年3月與台灣斷交, ', -0.5), ('並轉往與中國建交之後, ', -0.4), ('卻遲遲等不到中國的金援, ', -1),
176
+ ('談好要進口宏都拉斯的白蝦也拖延許久, ', -1),
177
+ (',甚至還將白蝦價格殺價至過往與台灣交易的半價,再加上墨西哥同時也中斷了與宏國的白蝦貿易,對當地蝦農可說是雪上加霜。 ',-1)]
178
+ -1
179
 
180
+ </pre>
181
+ </details>
182
 
 
183
 
 
184
 
 
185
 
 
186
 
187
+ # Get Started
188
 
189
+ *Caveat: System message should always be set.*
190
 
191
+ ## Hugging Face Transformers library
192
+ You can use Llama-3-Taiwan-70B with the Hugging Face Transformers library:
193
 
 
194
 
195
+ ```python
196
+ import torch
197
+ from transformers import pipeline, StoppingCriteria
198
 
199
+ # Define a custom stopping criteria class
200
+ class EosListStoppingCriteria(StoppingCriteria):
201
+ def __init__(self, eos_sequence=[128256]):
202
+ self.eos_sequence = eos_sequence
203
 
204
+ def __call__(self, input_ids: torch.LongTensor, scores: torch.FloatTensor, **kwargs) -> bool:
205
+ last_ids = input_ids[:, -len(self.eos_sequence):].tolist()
206
+ return self.eos_sequence in last_ids
207
 
208
+ # Initialize the model with automatic device mapping
209
+ llm = pipeline("text-generation", model="yentinglin/Llama-3-Taiwan-70B-Instruct-rc1", device_map="auto")
210
+ tokenizer = llm.tokenizer
211
 
212
+ # Define a conversation example
213
+ chat = [
214
+ {"role": "system", "content": "You are an AI assistant called Twllm, created by TAME (TAiwan Mixture of Expert) project."},
215
+ {"role": "user", "content": "你好,請問你可以完成什麼任務?"},
216
+ {"role": "assistant", "content": "你好,我可以幫助您解決各種問題、提供資訊並協助完成多種任務。例如:回答技術問題、提供建議、翻譯文字、尋找資料或協助您安排行程等。請告訴我如何能幫助您。"},
217
+ {"role": "user", "content": "太棒了!"}
218
+ ]
219
+ flatten_chat_for_generation = tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=True)
220
+ """
221
+ <|im_start|>user
222
+ You are an AI assistant called Twllm, created by TAME (TAiwan Mixture of Expert) project.<|im_end|>
223
+ <|im_start|>user
224
+ 你好,請問你可以完成什麼任務?<|im_end|>
225
+ <|im_start|>assistant
226
+ 你好,我可以幫助您解決各種問題、提供資訊和協助您完成許多不同的任務。例如:回答技術問題、提供建議、翻譯文字、尋找資料或協助您安排行程等。請告訴我如何能幫助您。<|im_end|>
227
+ <|im_start|>user
228
+ 太棒了!<|im_end|>
229
+ <|im_start|>assistant
230
+
231
+ """
232
+
233
+ # Generate a response using the custom stopping criteria
234
+ output = llm(flatten_chat_for_generation, return_full_text=False, max_new_tokens=128, top_p=0.9, temperature=0.7, stopping_criteria=[EosListStoppingCriteria([tokenizer.eos_token_id])])
235
+ print(output[0]['generated_text'])
236
+ "謝謝!很高興能夠為您服務。如果有任何其他需要協助的地方,請隨時與我聯繫。我會盡最大努力為您提供所需的支援。"
237
+ ```
238
+
239
+ ## vLLM
240
+
241
+ Start the server
242
+ ```bash
243
+ export NUM_GPUS=4
244
+ export PORT=8000
245
+
246
+ docker run \
247
+ -e HF_TOKEN=$HF_TOKEN \
248
+ --gpus '"device=0,1,2,3"' \
249
+ -v ~/.cache/huggingface:/root/.cache/huggingface \
250
+ -p "${PORT}:8000" \
251
+ --ipc=host \
252
+ vllm/vllm-openai:v0.4.0.post1 \
253
+ --model "yentinglin/Llama-3-Taiwan-70B-Instruct-rc1" \
254
+ -tp "${NUM_GPUS}"
255
+ ```
256
+
257
+ Sample client code, or you can use anything OpenAI-API compatible clients
258
+
259
+ ```python
260
+ # pip install "openai>=1.0.0"
261
+ from openai import OpenAI
262
+ # Set OpenAI's API key and API base to use vLLM's API server.
263
+ openai_api_key = "EMPTY"
264
+ openai_api_base = "http://localhost:8000/v1"
265
+
266
+ client = OpenAI(
267
+ api_key=openai_api_key,
268
+ base_url=openai_api_base,
269
+ )
270
+
271
+ chat_response = client.chat.completions.create(
272
+ model="yentinglin/Llama-3-Taiwan-70B-Instruct-rc1",
273
+ messages=[
274
+ {"role": "system", "content": "You are a helpful assistant."},
275
+ {"role": "user", "content": "Tell me a joke."},
276
+ ]
277
+ )
278
+ print("Chat response:", chat_response)
279
+ ```
280
+
281
+
282
+ Enjoy exploring the capabilities of Llama-3-Taiwan-70B! We look forward to seeing what you create with this powerful open-source model. If you have any questions or feedback, please let us know.
283
+
284
+ # Contributions
285
+ - [**Professor Yun-Nung (Vivian) Chen**](https://www.csie.ntu.edu.tw/~yvchen/), for her guidance and advisement throughout the project.
286
+ - [**Wei-Lin Chen**](mailto:wlchen@nlg.csie.ntu.edu.tw), for leading our pretraining data pipeline.
287
+ - [**Tzu-Han Lin**](mailto:b08902126@csie.ntu.edu.tw), for synthetic data generation.
288
+ - [**Chang-Sheng Kao**](mailto:cliff.cskao@gmail.com), for enhancing our synthetic data quality.
289
+ - [**Kang-Chieh Chen**](mailto:b09902125@csie.ntu.edu.tw), for cleaning instruction-following data.
290
+ - [**Min-Yi Chen**](mailto:minyi_chen@ccpgp.com) and [**Shao-Heng Hsu**](mailto:sh_hsu@ccpgp.com), for collecting chemical engineering data and benchmarks.
291
+
292
+ # Citation
293
+ ```
294
+ @article{DBLP:journals/corr/abs-2311-17487,
295
+ author = {Yen{-}Ting Lin and
296
+ Yun{-}Nung Chen},
297
+ title = {Taiwan {LLM:} Bridging the Linguistic Divide with a Culturally Aligned
298
+ Language Model},
299
+ journal = {CoRR},
300
+ volume = {abs/2311.17487},
301
+ year = {2023},
302
+ url = {https://doi.org/10.48550/arXiv.2311.17487},
303
+ doi = {10.48550/ARXIV.2311.17487},
304
+ eprinttype = {arXiv},
305
+ eprint = {2311.17487},
306
+ timestamp = {Tue, 05 Dec 2023 14:40:42 +0100},
307
+ biburl = {https://dblp.org/rec/journals/corr/abs-2311-17487.bib},
308
+ bibsource = {dblp computer science bibliography, https://dblp.org}
309
+ }
310
+ @article{DBLP:journals/corr/abs-2403-20180,
311
+ author = {Po{-}Heng Chen and
312
+ Sijia Cheng and
313
+ Wei{-}Lin Chen and
314
+ Yen{-}Ting Lin and
315
+ Yun{-}Nung Chen},
316
+ title = {Measuring Taiwanese Mandarin Language Understanding},
317
+ journal = {CoRR},
318
+ volume = {abs/2403.20180},
319
+ year = {2024},
320
+ url = {https://doi.org/10.48550/arXiv.2403.20180},
321
+ doi = {10.48550/ARXIV.2403.20180},
322
+ eprinttype = {arXiv},
323
+ eprint = {2403.20180},
324
+ timestamp = {Wed, 10 Apr 2024 17:37:45 +0200},
325
+ biburl = {https://dblp.org/rec/journals/corr/abs-2403-20180.bib},
326
+ bibsource = {dblp computer science bibliography, https://dblp.org}
327
+ }
328
+ ```