--- base_model: - shisa-ai/shisa-v1-llama3-8b - aixsatoshi/Llama-3-youko-8b-instruct-chatvector - meta-llama/Meta-Llama-3-8B-Instruct - lightblue/suzume-llama-3-8B-multilingual library_name: transformers tags: - mergekit - merge license: llama3 language: - ja --- # Llama-3-Umievo-itr014-Shizuko-8b このモデルは日本語に対応しているLlama-3ベースの4つのモデルを進化的アルゴリズムで進化的マージしたものです。Meta-Llama-3-8B-Instruct、Llama-3-youko-8b-instruct-chatvector、suzume-llama-3-8B-multilingual、shisa-v1-llama3-8bの4つのモデルを使用させていただきました。 マージに使用させていただいたモデル制作者のMeta、aixsatoshiさん、LightBlue、Shisa-AIのみなさまに感謝します。 This model is an evolutionary merge of four Llama-3-based models for Japanese using an evolutionary algorithm: Meta-Llama-3-8B-Instruct, Llama-3-youko-8b-instruct-chatvector, suzume- llama-3-8B-multilingual, and shisa-v1-llama3-8b. We would like to thank the model creators Meta, aixsatoshi, LightBlue, and Shisa-AI for allowing us to use their models for the merge. ElyzaTasks100ベンチマークで平均点が3.85でした。(Llama3-70Bによる自動評価を3回行った平均点) The average score was 3.85 on the ElyzaTasks100 benchmark. (Average score after 3 automatic evaluations by Llama3-70B) ![image/png](https://cdn-uploads.huggingface.co/production/uploads/630420b4eedc089484c853e8/x4BbxfaW_wXPjDfv1Z4lJ.png) ```python from transformers import AutoTokenizer, AutoModelForCausalLM import torch model_id = "umiyuki/Llama-3-Umievo-itr014-Shizuko-8b" tokenizer = AutoTokenizer.from_pretrained(model_id) model = AutoModelForCausalLM.from_pretrained( model_id, torch_dtype=torch.bfloat16, device_map="auto", ) messages = [ {"role": "system", "content": "You must answer all responses in Japanese.あなたは役に立つ誠実な日本人のアシスタントです。あなたは全ての回答に日本語で答えなければならない。"}, {"role": "user", "content": "二人の少女が終末世界を旅する物語を書いてください。"}, ] input_ids = tokenizer.apply_chat_template( messages, add_generation_prompt=True, return_tensors="pt" ).to(model.device) terminators = [ tokenizer.eos_token_id, tokenizer.convert_tokens_to_ids("<|eot_id|>") ] outputs = model.generate( input_ids, max_new_tokens=256, eos_token_id=terminators, do_sample=True, temperature=0.6, top_p=0.9, ) response = outputs[0][input_ids.shape[-1]:] print(tokenizer.decode(response, skip_special_tokens=True)) ``` This is a merge of pre-trained language models created using [mergekit](https://github.com/cg123/mergekit). ## Merge Details ### Merge Method This model was merged using the [linear](https://arxiv.org/abs/2203.05482) merge method using [meta-llama/Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) as a base. ### Models Merged The following models were included in the merge: * [shisa-ai/shisa-v1-llama3-8b](https://huggingface.co/shisa-ai/shisa-v1-llama3-8b) * [aixsatoshi/Llama-3-youko-8b-instruct-chatvector](https://huggingface.co/aixsatoshi/Llama-3-youko-8b-instruct-chatvector) * [lightblue/suzume-llama-3-8B-multilingual](https://huggingface.co/lightblue/suzume-llama-3-8B-multilingual) ### Configuration The following YAML configuration was used to produce this model: ```yaml base_model: meta-llama/Meta-Llama-3-8B-Instruct dtype: bfloat16 merge_method: linear parameters: int8_mask: 1.0 normalize: 1.0 slices: - sources: - layer_range: [0, 4] model: lightblue/suzume-llama-3-8B-multilingual parameters: weight: 0.4149739730274144 - layer_range: [0, 4] model: meta-llama/Meta-Llama-3-8B-Instruct parameters: weight: 0.6781276007090549 - layer_range: [0, 4] model: aixsatoshi/Llama-3-youko-8b-instruct-chatvector parameters: weight: 0.34616999273932425 - layer_range: [0, 4] model: shisa-ai/shisa-v1-llama3-8b parameters: weight: 1.3720042419649354 - sources: - layer_range: [4, 8] model: lightblue/suzume-llama-3-8B-multilingual parameters: weight: 0.07652836818139683 - layer_range: [4, 8] model: meta-llama/Meta-Llama-3-8B-Instruct parameters: weight: 1.234379009181979 - layer_range: [4, 8] model: aixsatoshi/Llama-3-youko-8b-instruct-chatvector parameters: weight: 1.0146729889059811 - layer_range: [4, 8] model: shisa-ai/shisa-v1-llama3-8b parameters: weight: 0.5811532109389872 - sources: - layer_range: [8, 12] model: lightblue/suzume-llama-3-8B-multilingual parameters: weight: 0.5551700273906248 - layer_range: [8, 12] model: meta-llama/Meta-Llama-3-8B-Instruct parameters: weight: 0.7418501521559635 - layer_range: [8, 12] model: aixsatoshi/Llama-3-youko-8b-instruct-chatvector parameters: weight: 1.442504375594772 - layer_range: [8, 12] model: shisa-ai/shisa-v1-llama3-8b parameters: weight: 0.6475631873316974 - sources: - layer_range: [12, 16] model: lightblue/suzume-llama-3-8B-multilingual parameters: weight: 0.4227647782669271 - layer_range: [12, 16] model: meta-llama/Meta-Llama-3-8B-Instruct parameters: weight: 1.2969869792284983 - layer_range: [12, 16] model: aixsatoshi/Llama-3-youko-8b-instruct-chatvector parameters: weight: 0.7818773805802817 - layer_range: [12, 16] model: shisa-ai/shisa-v1-llama3-8b parameters: weight: 0.8007371182560976 - sources: - layer_range: [16, 20] model: lightblue/suzume-llama-3-8B-multilingual parameters: weight: 0.10979010874744283 - layer_range: [16, 20] model: meta-llama/Meta-Llama-3-8B-Instruct parameters: weight: 0.19009547180175693 - layer_range: [16, 20] model: aixsatoshi/Llama-3-youko-8b-instruct-chatvector parameters: weight: 0.6064294349661996 - layer_range: [16, 20] model: shisa-ai/shisa-v1-llama3-8b parameters: weight: 0.7630087852386511 - sources: - layer_range: [20, 24] model: lightblue/suzume-llama-3-8B-multilingual parameters: weight: 0.219671192433268 - layer_range: [20, 24] model: meta-llama/Meta-Llama-3-8B-Instruct parameters: weight: 0.6303503074132494 - layer_range: [20, 24] model: aixsatoshi/Llama-3-youko-8b-instruct-chatvector parameters: weight: 0.46265431269055757 - layer_range: [20, 24] model: shisa-ai/shisa-v1-llama3-8b parameters: weight: 1.4662350856064592 - sources: - layer_range: [24, 28] model: lightblue/suzume-llama-3-8B-multilingual parameters: weight: 0.1400550380200451 - layer_range: [24, 28] model: meta-llama/Meta-Llama-3-8B-Instruct parameters: weight: 1.031570135674053 - layer_range: [24, 28] model: aixsatoshi/Llama-3-youko-8b-instruct-chatvector parameters: weight: 0.5760956440228217 - layer_range: [24, 28] model: shisa-ai/shisa-v1-llama3-8b parameters: weight: 1.5264012437679564 - sources: - layer_range: [28, 32] model: lightblue/suzume-llama-3-8B-multilingual parameters: weight: 1.2311282964552015 - layer_range: [28, 32] model: meta-llama/Meta-Llama-3-8B-Instruct parameters: weight: 0.43811773040605967 - layer_range: [28, 32] model: aixsatoshi/Llama-3-youko-8b-instruct-chatvector parameters: weight: 0.5150682019605872 - layer_range: [28, 32] model: shisa-ai/shisa-v1-llama3-8b parameters: weight: 0.342193342214983 ``` Built with Meta Llama 3 Meta Llama 3 is licensed under the Meta Llama 3 Community License, Copyright © Meta Platforms, Inc. All Rights Reserved