File size: 13,407 Bytes
1eea05e f6f7b5c 585d205 c4e7a31 192686b c4e7a31 1eea05e 5753d08 1eea05e c4e7a31 1eea05e c4e7a31 1eea05e 38413ac 1eea05e 38413ac 1eea05e eb068cf 1eea05e c4e7a31 1eea05e 68c8cea 1eea05e 18916e1 1eea05e 68c8cea 1eea05e 42ea3b1 1eea05e |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 |
---
license: llama3.1
pipeline_tag: text-generation
---
**Llama3.1-Typhoon2-70B**: Thai Large Language Model (Instruct)
**Llama3.1-Typhoon2-70B-instruct** is a instruct Thai 🇹🇭 large language model with 70 billion parameters, and it is based on Llama3.1-70B.
For technical-report. please see our [arxiv](https://arxiv.org/abs/2412.13702).
*To acknowledge Meta's effort in creating the foundation model and to comply with the license, we explicitly include "llama-3.1" in the model name.
## **Performance**
**Instruction-Following & Function Call Performance**
<div align="center">
<img src="https://storage.googleapis.com/typhoon-public/assets/typhoon2-text/llama70b_general.png" alt="Typhoon2 70B General Performance" width="100%" style="margin-left:'auto' margin-right:'auto' display:'block'"/>
</div>
**Specific Domain Performance (Math & Coding)**
<div align="center">
<img src="https://storage.googleapis.com/typhoon-public/assets/typhoon2-text/llama70b_specific.png" alt="Typhoon2 70B Specific Domain Performance" width="100%" style="margin-left:'auto' margin-right:'auto' display:'block'"/>
</div>
**Long Context Performance**
<div align="center">
<img src="https://storage.googleapis.com/typhoon-public/assets/typhoon2-text/llama70b_long.jpg" alt="Typhoon2 70B Long Context Performance" width="100%" style="margin-left:'auto' margin-right:'auto' display:'block'"/>
</div>
**Detail Performance**
| Model | IFEval - TH | IFEval - EN | MT-Bench TH | MT-Bench EN | Thai Code-Switching(t=0.7) | Thai Code-Switching(t=1.0) | FunctionCall-TH | FunctionCall-EN | GSM8K-TH | GSM8K-EN | MATH-TH | MATH-EN | HumanEval-TH | HumanEval-EN | MBPP-TH | MBPP-EN |
|--------------------------------|-------------|-------------|-------------|-------------|--------------------------------|--------------------------------|-----------|-----------|-----------|-----------|-----------|-----------|-------------|-------------|-----------|-----------|
| **Typhoon2 Llama3.1 70B Instruct**| **81.45%** | 88.72% | **7.3626** | 8.8562 | **98.8%** | **94.8%** | **70.8%** | 65.7% | **88.79%** | **93.43%** | **59.60%** | 64.96% | 79.9% | 83.5% | 86.0% | 84.9% |
| **Llama3.3 70B Instruct** | 81.01% | **91.51%** | 6.7967 | 8.8343 | 72.6% | 39.2% | 50.3% | 56.3% | 61.63% | 87.71% | 44.37% | **73.58%** | 81.7% | 84.1% | 84.9% | 87.3% |
| **Openthaigpt1.5 72B** | 80.37% | 84.56% | 7.3131 | **9.0893** | 95.6% | 50.4% | 67.1% | **74.6%** | 79.15% | 89.91% | 43.65% | 81.8% | **81.7%** | **84.8%** | **88.9%** | **89.7%** |
## **Model Description**
- **Model type**: A 70B instruct decoder-only model based on Llama architecture.
- **Requirement**: transformers 4.45.0 or newer.
- **Context length**: 90k
- **Primary Language(s)**: Thai 🇹🇭 and English 🇬🇧
- **License**: [Llama 3.1 Community License](https://github.com/meta-llama/llama-models/blob/main/models/llama3_1/LICENSE)
## Usage Example
```python
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
model_id = "scb10x/llama3.1-typhoon2-70b-instruct"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=torch.bfloat16,
device_map="auto",
)
messages = [
{"role": "system", "content": "You are a male AI assistant named Typhoon created by SCB 10X to be helpful, harmless, and honest. Typhoon is happy to help with analysis, question answering, math, coding, creative writing, teaching, role-play, general discussion, and all sorts of other tasks. Typhoon responds directly to all human messages without unnecessary affirmations or filler phrases like “Certainly!”, “Of course!”, “Absolutely!”, “Great!”, “Sure!”, etc. Specifically, Typhoon avoids starting responses with the word “Certainly” in any way. Typhoon follows this information in all languages, and always responds to the user in the language they use or request. Typhoon is now being connected with a human. Write in fluid, conversational prose, Show genuine interest in understanding requests, Express appropriate emotions and empathy. Also showing information in term that is easy to understand and visualized."},
{"role": "user", "content": "ขอสูตรไก่ย่าง"},
]
input_ids = tokenizer.apply_chat_template(
messages,
add_generation_prompt=True,
return_tensors="pt"
).to(model.device)
terminators = [
tokenizer.eos_token_id,
tokenizer.convert_tokens_to_ids("<|eot_id|>")
]
outputs = model.generate(
input_ids,
max_new_tokens=512,
eos_token_id=terminators,
do_sample=True,
temperature=0.7,
top_p=0.95,
)
response = outputs[0][input_ids.shape[-1]:]
print(tokenizer.decode(response, skip_special_tokens=True))
```
## Inference Server Hosting Example
```bash
pip install vllm
vllm serve scb10x/llama3.1-typhoon2-70b-instruct --tensor-parallel-size 2 --gpu-memory-utilization 0.95 --max-model-len 16384
# using at least 2 80GB gpu eg A100, H100 for hosting 70b model
# to serving longer context, 4-8 gpu is required
# see more information at https://docs.vllm.ai/
```
## Function-Call Example
```python
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
import ast
model_name = "scb10x/llama3.1-typhoon2-70b-instruct"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
model_name, torch_dtype=torch.bfloat16, device_map='auto'
)
get_weather_api = {
"name": "get_weather",
"description": "Get the current weather for a location",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "The city and state, e.g. San Francisco, New York",
},
"unit": {
"type": "string",
"enum": ["celsius", "fahrenheit"],
"description": "The unit of temperature to return",
},
},
"required": ["location"],
},
}
search_api = {
"name": "search",
"description": "Search for information on the internet",
"parameters": {
"type": "object",
"properties": {
"query": {
"type": "string",
"description": "The search query, e.g. 'latest news on AI'",
}
},
"required": ["query"],
},
}
get_stock = {
"name": "get_stock_price",
"description": "Get the stock price",
"parameters": {
"type": "object",
"properties": {
"symbol": {
"type": "string",
"description": "The stock symbol, e.g. AAPL, GOOG",
}
},
"required": ["symbol"],
},
}
# Tool input are same format with OpenAI tools
openai_format_tools = [get_weather_api, search_api, get_stock]
messages = [
{"role": "system", "content": "You are an expert in composing functions."},
{"role": "user", "content": "ขอราคาหุ้น Tasla (TLS) และ Amazon (AMZ) ?"},
]
inputs = tokenizer.apply_chat_template(
messages, tools=openai_format_tools, add_generation_prompt=True, return_tensors="pt"
).to(model.device)
outputs = model.generate(
inputs,
max_new_tokens=512,
do_sample=True,
temperature=0.7,
num_return_sequences=1,
eos_token_id=[tokenizer.eos_token_id, 128009],
)
response = outputs[0][inputs.shape[-1]:]
print("Here Output:", tokenizer.decode(response, skip_special_tokens=True))
# Decoding function utility
def resolve_ast_by_type(value):
if isinstance(value, ast.Constant):
if value.value is Ellipsis:
output = "..."
else:
output = value.value
elif isinstance(value, ast.UnaryOp):
output = -value.operand.value
elif isinstance(value, ast.List):
output = [resolve_ast_by_type(v) for v in value.elts]
elif isinstance(value, ast.Dict):
output = {
resolve_ast_by_type(k): resolve_ast_by_type(v)
for k, v in zip(value.keys, value.values)
}
elif isinstance(
value, ast.NameConstant
): # Added this condition to handle boolean values
output = value.value
elif isinstance(
value, ast.BinOp
): # Added this condition to handle function calls as arguments
output = eval(ast.unparse(value))
elif isinstance(value, ast.Name):
output = value.id
elif isinstance(value, ast.Call):
if len(value.keywords) == 0:
output = ast.unparse(value)
else:
output = resolve_ast_call(value)
elif isinstance(value, ast.Tuple):
output = tuple(resolve_ast_by_type(v) for v in value.elts)
elif isinstance(value, ast.Lambda):
output = eval(ast.unparse(value.body[0].value))
elif isinstance(value, ast.Ellipsis):
output = "..."
elif isinstance(value, ast.Subscript):
try:
output = ast.unparse(value.body[0].value)
except:
output = ast.unparse(value.value) + "[" + ast.unparse(value.slice) + "]"
else:
raise Exception(f"Unsupported AST type: {type(value)}")
return output
def resolve_ast_call(elem):
func_parts = []
func_part = elem.func
while isinstance(func_part, ast.Attribute):
func_parts.append(func_part.attr)
func_part = func_part.value
if isinstance(func_part, ast.Name):
func_parts.append(func_part.id)
func_name = ".".join(reversed(func_parts))
args_dict = {}
for arg in elem.keywords:
output = resolve_ast_by_type(arg.value)
args_dict[arg.arg] = output
return {func_name: args_dict}
def ast_parse(input_str, language="Python"):
if language == "Python":
cleaned_input = input_str.strip("[]'")
parsed = ast.parse(cleaned_input, mode="eval")
extracted = []
if isinstance(parsed.body, ast.Call):
extracted.append(resolve_ast_call(parsed.body))
else:
for elem in parsed.body.elts:
assert isinstance(elem, ast.Call)
extracted.append(resolve_ast_call(elem))
return extracted
else:
raise NotImplementedError(f"Unsupported language: {language}")
def parse_nested_value(value):
"""
Parse a potentially nested value from the AST output.
Args:
value: The value to parse, which could be a nested dictionary, which includes another function call, or a simple value.
Returns:
str: A string representation of the value, handling nested function calls and nested dictionary function arguments.
"""
if isinstance(value, dict):
# Check if the dictionary represents a function call (i.e., the value is another dictionary or complex structure)
if all(isinstance(v, dict) for v in value.values()):
func_name = list(value.keys())[0]
args = value[func_name]
args_str = ", ".join(
f"{k}={parse_nested_value(v)}" for k, v in args.items()
)
return f"{func_name}({args_str})"
else:
# If it's a simple dictionary, treat it as key-value pairs
return (
"{"
+ ", ".join(f"'{k}': {parse_nested_value(v)}" for k, v in value.items())
+ "}"
)
return repr(value)
def default_decode_ast_prompting(result, language="Python"):
result = result.strip("`\n ")
if not result.startswith("["):
result = "[" + result
if not result.endswith("]"):
result = result + "]"
decoded_output = ast_parse(result, language)
return decoded_output
fc_result = default_decode_ast_prompting(tokenizer.decode(response, skip_special_tokens=True))
print(fc_result) # [{'Function': {'arguments': '{"symbol": "TLS"}', 'name': 'get_stock_price'}}, {'Function': {'arguments': '{"symbol": "AMZ"}', 'name': 'get_stock_price'}}]
```
## **Intended Uses & Limitations**
This model is an instructional model. However, it’s still undergoing development. It incorporates some level of guardrails, but it still may produce answers that are inaccurate, biased, or otherwise objectionable in response to user prompts. We recommend that developers assess these risks in the context of their use case.
## **Follow us**
**https://twitter.com/opentyphoon**
## **Support**
**https://discord.gg/CqyBscMFpg**
## **Citation**
- If you find Typhoon2 useful for your work, please cite it using:
```
@misc{typhoon2,
title={Typhoon 2: A Family of Open Text and Multimodal Thai Large Language Models},
author={Kunat Pipatanakul and Potsawee Manakul and Natapong Nitarach and Warit Sirichotedumrong and Surapon Nonesung and Teetouch Jaknamon and Parinthapat Pengpun and Pittawat Taveekitworachai and Adisai Na-Thalang and Sittipong Sripaisarnmongkol and Krisanapong Jirayoot and Kasima Tharnpipitchai},
year={2024},
eprint={2412.13702},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2412.13702},
}
``` |