leejunhyeok's picture
minor fix in table
a7f7f83 verified
|
raw
history blame
3.47 kB
---
license: mit
---
![image/png](https://cdn-uploads.huggingface.co/production/uploads/64c0c845a04a514ba62bcd1a/RFpsPxlc_3cK0kmWj-tYR.png)
# **Introduction**
We introduce Motif, a new language model family of [**Moreh**](https://moreh.io/), specialized in Korean and English.\
Motif-102B-Instruct is a chat model tuned from the base model [Motif-102B](https://huggingface.co/moreh/Motif-102B).
## Training Platform
- Motif-102B is trained on [**MoAI platform**](https://moreh.io/product), with AMD's MI250 GPU.
- The MoAI platform simplifies scalable, cost efficient training of large-scale models across multiple nodes.
- The MoAI platform also supports various optimized and automated parallelization without any complex manual works.
- One can find more information on the MoAI Platform in https://moreh.io/product
- Or, contact us directly [contact@moreh.io](mailto:contact@moreh.io)
## Quick Usage
You can chat directly with our model Motif through our [Model hub](https://model-hub.moreh.io/).
## Details
More details will be provided in the upcoming technical report.
### Release Date
2024.09.30
### Benchmark Results
| Model | KMMLU |
|------------------------------|-------|
| GPT-4-base-0613\**| 57.62 |
| Llama3.1-70B-instruct *| 52.1 |
| **Motif-102B** \**+| 58.25 |
| Motif-102B-Instruct \**+| 57.98 |
β€˜*’ : Community reported
β€˜**’ : Measured by the authors
β€˜+’ : Indicates the model is specialized in Korean
## How to use
### Use with vLLM
- Minimum requirements: 4xA100 80GB GPUs
- Refer to this [link](https://github.com/vllm-project/vllm) to install vllm
```python
from transformers import AutoTokenizer
from vllm import LLM, SamplingParams
# for minimum, we recommand using 4x A100 80GB GPUs for inference with vllm
# If you have more GPUs, change tensor parallel size to GPU numbers you can afford
model = LLM("moreh/Motif-100B-Instruct", tensor_parallel_size=4)
tokenizer = AutoTokenizer.from_pretrained("moreh/Motif-100B-Instruct")
messages = [
{"role": "system", "content": "You are a helpful assistant"},
{"role": "user", "content": "μœ μΉ˜μ›μƒμ—κ²Œ λΉ…λ±… 이둠의 κ°œλ…μ„ μ„€λͺ…ν•΄λ³΄μ„Έμš”"},
]
messages_batch = [tokenizer.apply_chat_template(conversation=messages, add_generation_prompt=True, tokenize=False)]
# vllm does not support generation_config of hf. So we have to set it like below
sampling_params = SamplingParams(max_tokens=512, temperature=0, repetition_penalty=1.0, stop_token_ids=[tokenizer.eos_token_id])
responses = model.generate(messages_batch, sampling_params=sampling_params)
print(responses[0].outputs[0].text)
```
### Use with transformers
- Minimum requirements: 4xA100 80GB GPUs OR 4xAMD MI250 GPUs
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
model_id = "moreh/Motif-100B-Instruct"
# all generation configs are set in generation_configs.json
model = AutoModelForCausalLM.from_pretrained(model_id).cuda()
tokenizer = AutoTokenizer.from_pretrained(model_id)
messages = [
{"role": "system", "content": "You are a helpful assistant"},
{"role": "user", "content": "μœ μΉ˜μ›μƒμ—κ²Œ λΉ…λ±… 이둠의 κ°œλ…μ„ μ„€λͺ…ν•΄λ³΄μ„Έμš”"},
]
messages_batch = tokenizer.apply_chat_template(conversation=messages, add_generation_prompt=True, tokenize=False)
input_ids = tokenizer(messages_batch, padding=True, return_tensors='pt')['input_ids'].cuda()
outputs = model.generate(input_ids)
```