moreh
/

Llama-3-Motif-102B-Instruct

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

Llama-3-Motif-102B-Instruct / README.md

leejunhyeok's picture

minor fix in table

a7f7f83 verified 3 months ago

|

3.47 kB

	---
	license: mit
	---

	![image/png](https://cdn-uploads.huggingface.co/production/uploads/64c0c845a04a514ba62bcd1a/RFpsPxlc_3cK0kmWj-tYR.png)

	# Introduction
	We introduce Motif, a new language model family of [Moreh](https://moreh.io/), specialized in Korean and English.\
	Motif-102B-Instruct is a chat model tuned from the base model [Motif-102B](https://huggingface.co/moreh/Motif-102B).

	## Training Platform
	- Motif-102B is trained on [MoAI platform](https://moreh.io/product), with AMD's MI250 GPU.
	- The MoAI platform simplifies scalable, cost efficient training of large-scale models across multiple nodes.
	- The MoAI platform also supports various optimized and automated parallelization without any complex manual works.
	- One can find more information on the MoAI Platform in https://moreh.io/product
	- Or, contact us directly [contact@moreh.io](mailto:contact@moreh.io)

	## Quick Usage
	You can chat directly with our model Motif through our [Model hub](https://model-hub.moreh.io/).

	## Details
	More details will be provided in the upcoming technical report.

	### Release Date
	2024.09.30

	### Benchmark Results

	\| Model \| KMMLU \|
	\|------------------------------\|-------\|
	\| GPT-4-base-0613\**\| 57.62 \|
	\| Llama3.1-70B-instruct *\| 52.1 \|
	\| Motif-102B \**+\| 58.25 \|
	\| Motif-102B-Instruct \**+\| 57.98 \|

	‘*’ : Community reported
	‘**’ : Measured by the authors
	‘+’ : Indicates the model is specialized in Korean


	## How to use

	### Use with vLLM
	- Minimum requirements: 4xA100 80GB GPUs
	- Refer to this [link](https://github.com/vllm-project/vllm) to install vllm
	```python
	from transformers import AutoTokenizer
	from vllm import LLM, SamplingParams

	# for minimum, we recommand using 4x A100 80GB GPUs for inference with vllm
	# If you have more GPUs, change tensor parallel size to GPU numbers you can afford
	model = LLM("moreh/Motif-100B-Instruct", tensor_parallel_size=4)
	tokenizer = AutoTokenizer.from_pretrained("moreh/Motif-100B-Instruct")
	messages = [
	{"role": "system", "content": "You are a helpful assistant"},
	{"role": "user", "content": "유치원생에게 빅뱅 이론의 개념을 설명해보세요"},
	]

	messages_batch = [tokenizer.apply_chat_template(conversation=messages, add_generation_prompt=True, tokenize=False)]

	# vllm does not support generation_config of hf. So we have to set it like below
	sampling_params = SamplingParams(max_tokens=512, temperature=0, repetition_penalty=1.0, stop_token_ids=[tokenizer.eos_token_id])
	responses = model.generate(messages_batch, sampling_params=sampling_params)

	print(responses[0].outputs[0].text)
	```

	### Use with transformers
	- Minimum requirements: 4xA100 80GB GPUs OR 4xAMD MI250 GPUs
	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer
	import torch

	model_id = "moreh/Motif-100B-Instruct"

	# all generation configs are set in generation_configs.json
	model = AutoModelForCausalLM.from_pretrained(model_id).cuda()
	tokenizer = AutoTokenizer.from_pretrained(model_id)
	messages = [
	{"role": "system", "content": "You are a helpful assistant"},
	{"role": "user", "content": "유치원생에게 빅뱅 이론의 개념을 설명해보세요"},
	]

	messages_batch = tokenizer.apply_chat_template(conversation=messages, add_generation_prompt=True, tokenize=False)
	input_ids = tokenizer(messages_batch, padding=True, return_tensors='pt')['input_ids'].cuda()

	outputs = model.generate(input_ids)
	```