base_model:
- shisa-ai/shisa-v1-llama3-8b
- aixsatoshi/Llama-3-youko-8b-instruct-chatvector
- meta-llama/Meta-Llama-3-8B-Instruct
- lightblue/suzume-llama-3-8B-multilingual
library_name: transformers
tags:
- mergekit
- merge
license: llama3
language:
- ja
Llama-3-Umievo-itr014-Shizuko-8b
このモデルは日本語に対応しているLlama-3ベースの4つのモデルを進化的アルゴリズムで進化的マージしたものです。Meta-Llama-3-8B-Instruct、Llama-3-youko-8b-instruct-chatvector、suzume-llama-3-8B-multilingual、sa-v1-llama3-8bの4つのモデルを使用させていただきました。 マージに使用させていただいたモデル制作者のMeta、aixsatoshiさん、LightBlue、Shisa-AIのみなさまに感謝します。
This model is an evolutionary merge of four Llama-3-based models for Japanese using an evolutionary algorithm: Meta-Llama-3-8B-Instruct, Llama-3-youko-8b-instruct-chatvector, suzume- llama-3-8B-multilingual, and sa-v1-llama3-8b. We would like to thank the model creators Meta, aixsatoshi, LightBlue, and Shisa-AI for allowing us to use their models for the merge.
ElyzaTasks100ベンチマークで平均点が3.85でした。(Llama3-70Bによる自動評価を3回行った平均点)
The average score was 3.85 on the ElyzaTasks100 benchmark. (Average score after 3 automatic evaluations by Llama3-70B)
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
model_id = "umiyuki/Llama-3-Umievo-itr014-Shizuko-8b"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=torch.bfloat16,
device_map="auto",
)
messages = [
{"role": "system", "content": "You must answer all responses in Japanese.あなたは役に立つ誠実な日本人のアシスタントです。あなたは全ての回答に日本語で答えなければならない。"},
{"role": "user", "content": "二人の少女が終末世界を旅する物語を書いてください。"},
]
input_ids = tokenizer.apply_chat_template(
messages,
add_generation_prompt=True,
return_tensors="pt"
).to(model.device)
terminators = [
tokenizer.eos_token_id,
tokenizer.convert_tokens_to_ids("<|eot_id|>")
]
outputs = model.generate(
input_ids,
max_new_tokens=256,
eos_token_id=terminators,
do_sample=True,
temperature=0.6,
top_p=0.9,
)
response = outputs[0][input_ids.shape[-1]:]
print(tokenizer.decode(response, skip_special_tokens=True))
This is a merge of pre-trained language models created using mergekit.
Merge Details
Merge Method
This model was merged using the linear merge method using meta-llama/Meta-Llama-3-8B-Instruct as a base.
Models Merged
The following models were included in the merge:
- shisa-ai/shisa-v1-llama3-8b
- aixsatoshi/Llama-3-youko-8b-instruct-chatvector
- lightblue/suzume-llama-3-8B-multilingual
Configuration
The following YAML configuration was used to produce this model:
base_model: meta-llama/Meta-Llama-3-8B-Instruct
dtype: bfloat16
merge_method: linear
parameters:
int8_mask: 1.0
normalize: 1.0
slices:
- sources:
- layer_range: [0, 4]
model: lightblue/suzume-llama-3-8B-multilingual
parameters:
weight: 0.4149739730274144
- layer_range: [0, 4]
model: meta-llama/Meta-Llama-3-8B-Instruct
parameters:
weight: 0.6781276007090549
- layer_range: [0, 4]
model: aixsatoshi/Llama-3-youko-8b-instruct-chatvector
parameters:
weight: 0.34616999273932425
- layer_range: [0, 4]
model: shisa-ai/shisa-v1-llama3-8b
parameters:
weight: 1.3720042419649354
- sources:
- layer_range: [4, 8]
model: lightblue/suzume-llama-3-8B-multilingual
parameters:
weight: 0.07652836818139683
- layer_range: [4, 8]
model: meta-llama/Meta-Llama-3-8B-Instruct
parameters:
weight: 1.234379009181979
- layer_range: [4, 8]
model: aixsatoshi/Llama-3-youko-8b-instruct-chatvector
parameters:
weight: 1.0146729889059811
- layer_range: [4, 8]
model: shisa-ai/shisa-v1-llama3-8b
parameters:
weight: 0.5811532109389872
- sources:
- layer_range: [8, 12]
model: lightblue/suzume-llama-3-8B-multilingual
parameters:
weight: 0.5551700273906248
- layer_range: [8, 12]
model: meta-llama/Meta-Llama-3-8B-Instruct
parameters:
weight: 0.7418501521559635
- layer_range: [8, 12]
model: aixsatoshi/Llama-3-youko-8b-instruct-chatvector
parameters:
weight: 1.442504375594772
- layer_range: [8, 12]
model: shisa-ai/shisa-v1-llama3-8b
parameters:
weight: 0.6475631873316974
- sources:
- layer_range: [12, 16]
model: lightblue/suzume-llama-3-8B-multilingual
parameters:
weight: 0.4227647782669271
- layer_range: [12, 16]
model: meta-llama/Meta-Llama-3-8B-Instruct
parameters:
weight: 1.2969869792284983
- layer_range: [12, 16]
model: aixsatoshi/Llama-3-youko-8b-instruct-chatvector
parameters:
weight: 0.7818773805802817
- layer_range: [12, 16]
model: shisa-ai/shisa-v1-llama3-8b
parameters:
weight: 0.8007371182560976
- sources:
- layer_range: [16, 20]
model: lightblue/suzume-llama-3-8B-multilingual
parameters:
weight: 0.10979010874744283
- layer_range: [16, 20]
model: meta-llama/Meta-Llama-3-8B-Instruct
parameters:
weight: 0.19009547180175693
- layer_range: [16, 20]
model: aixsatoshi/Llama-3-youko-8b-instruct-chatvector
parameters:
weight: 0.6064294349661996
- layer_range: [16, 20]
model: shisa-ai/shisa-v1-llama3-8b
parameters:
weight: 0.7630087852386511
- sources:
- layer_range: [20, 24]
model: lightblue/suzume-llama-3-8B-multilingual
parameters:
weight: 0.219671192433268
- layer_range: [20, 24]
model: meta-llama/Meta-Llama-3-8B-Instruct
parameters:
weight: 0.6303503074132494
- layer_range: [20, 24]
model: aixsatoshi/Llama-3-youko-8b-instruct-chatvector
parameters:
weight: 0.46265431269055757
- layer_range: [20, 24]
model: shisa-ai/shisa-v1-llama3-8b
parameters:
weight: 1.4662350856064592
- sources:
- layer_range: [24, 28]
model: lightblue/suzume-llama-3-8B-multilingual
parameters:
weight: 0.1400550380200451
- layer_range: [24, 28]
model: meta-llama/Meta-Llama-3-8B-Instruct
parameters:
weight: 1.031570135674053
- layer_range: [24, 28]
model: aixsatoshi/Llama-3-youko-8b-instruct-chatvector
parameters:
weight: 0.5760956440228217
- layer_range: [24, 28]
model: shisa-ai/shisa-v1-llama3-8b
parameters:
weight: 1.5264012437679564
- sources:
- layer_range: [28, 32]
model: lightblue/suzume-llama-3-8B-multilingual
parameters:
weight: 1.2311282964552015
- layer_range: [28, 32]
model: meta-llama/Meta-Llama-3-8B-Instruct
parameters:
weight: 0.43811773040605967
- layer_range: [28, 32]
model: aixsatoshi/Llama-3-youko-8b-instruct-chatvector
parameters:
weight: 0.5150682019605872
- layer_range: [28, 32]
model: shisa-ai/shisa-v1-llama3-8b
parameters:
weight: 0.342193342214983
Built with Meta Llama 3
Meta Llama 3 is licensed under the Meta Llama 3 Community License, Copyright © Meta Platforms, Inc. All Rights Reserved