Update README.md

f5d3bda verified 6 months ago

No virus

4 kB

	---
	base_model:
	- elyza/ELYZA-japanese-Llama-2-13b
	- elyza/ELYZA-japanese-Llama-2-13b-instruct
	license: llama2
	language:
	- ja
	tags:
	- mergekit
	- merge
	- MoE
	---
	# ELYZA-japanese-Llama-2-MoE-2x13B-v0.1
	[English description here](#description)


	## 概要
	Llama-2ベースの学習済み日本語モデルである[elyza/ELYZA-japanese-Llama-2-13b](https://huggingface.co/elyza/ELYZA-japanese-Llama-2-13b)と、そのinstruction tuningモデルである[elyza/ELYZA-japanese-Llama-2-13b-instruct](https://huggingface.co/elyza/ELYZA-japanese-Llama-2-13b-instruct)
	を、[mergekit](https://github.com/cg123/mergekit)を使ってMoEを行い作成したモデルです。

	[GGUF版はこちら](https://huggingface.co/Aratako/ELYZA-japanese-Llama-2-MoE-2x13B-v0.1-GGUF)

	以下2モデルを利用しています。
	- [elyza/ELYZA-japanese-Llama-2-13b](https://huggingface.co/elyza/ELYZA-japanese-Llama-2-13b)
	- [elyza/ELYZA-japanese-Llama-2-13b-instruct](https://huggingface.co/elyza/ELYZA-japanese-Llama-2-13b-instruct)

	## ライセンス
	元モデルの通り、Llama2ライセンスを継承します。

	Llama 2 is licensed under the LLAMA 2 Community License, Copyright (c) Meta Platforms, Inc. All Rights Reserved.

	## ベンチマーク
	ベースとしたELYZA-japanese-Llama-2-13b-instructと本モデルの[japanese-mt-bench](https://github.com/Stability-AI/FastChat/tree/jp-stable/fastchat/llm_judge)の結果は以下の通りです。
	（シングルターン）
	\|Model\|Size\|Coding\|Extraction\|Humanities\|Math\|Reasoning\|Roleplay\|STEM\|Writing\|avg_score\|
	\|---\|---\|---\|---\|---\|---\|---\|---\|---\|---\|---\|
	\| ELYZA-japanese-Llama-2-13b-instruct \| 13B \| 3.7 \| 6.0 \| 6.6 \| 2.4 \| 2.5 \| 5.2 \| 5.8 \| 7.2 \| 4.925 \|
	\| This model \| 2x13B \| 3.7 \| 6.9 \| 6.3 \| 3.7 \| 4.4 \| 6.0 \| 7.0 \| 7.4 \| 5.675 \|

	![レーダーチャート](./japanese_mt_bench.png)

	ベンチマークに使用したプロンプト
	```
	"""<s>[INST] <<SYS>>
	あなたは誠実で優秀な日本人のアシスタントです。
	<</SYS>>

	{instruction} [/INST]"""
	```

	## Description
	This model is created using MoE (Mixture of Experts) through mergekit based on [elyza/ELYZA-japanese-Llama-2-13b](https://huggingface.co/elyza/ELYZA-japanese-Llama-2-13b) and [elyza/ELYZA-japanese-Llama-2-13b-instruct](https://huggingface.co/elyza/ELYZA-japanese-Llama-2-13b-instruct).

	[Click here for the GGUF version](https://huggingface.co/Aratako/ELYZA-japanese-Llama-2-MoE-2x13B-v0.1-GGUF)

	It utilizes the following two models:
	- [elyza/ELYZA-japanese-Llama-2-13b](https://huggingface.co/elyza/ELYZA-japanese-Llama-2-13b)
	- [elyza/ELYZA-japanese-Llama-2-13b-instruct](https://huggingface.co/elyza/ELYZA-japanese-Llama-2-13b-instruct)

	## License
	This model inherit the Llama2 license.

	Llama 2 is licensed under the LLAMA 2 Community License, Copyright (c) Meta Platforms, Inc. All Rights Reserved.

	## Benchmark
	The results of this model and the base ELYZA-japanese-Llama-2-13b-instruct on japanese-mt-bench are as follows.
	(Single turn)
	\|Model\|Size\|Coding\|Extraction\|Humanities\|Math\|Reasoning\|Roleplay\|STEM\|Writing\|avg_score\|
	\|---\|---\|---\|---\|---\|---\|---\|---\|---\|---\|---\|
	\| ELYZA-japanese-Llama-2-13b-instruct \| 13B \| 3.7 \| 6.0 \| 6.6 \| 2.4 \| 2.5 \| 5.2 \| 5.8 \| 7.2 \| 4.925 \|
	\| This model \| 2x13B \| 3.7 \| 6.9 \| 6.3 \| 3.7 \| 4.4 \| 6.0 \| 7.0 \| 7.4 \| 5.675 \|

	![レーダーチャート](./japanese_mt_bench.png)

	Prompt used for benchmark
	```
	"""<s>[INST] <<SYS>>
	あなたは誠実で優秀な日本人のアシスタントです。
	<</SYS>>

	{instruction} [/INST]"""
	```

	## Merge config
	[mergekit_config.yml](./mergekit_moe_config.yml)
	```yaml
	base_model: ./ELYZA-japanese-Llama-2-13b-instruct
	gate_mode: random
	dtype: bfloat16
	experts:
	- source_model: ./ELYZA-japanese-Llama-2-13b-instruct
	positive_prompts: []
	- source_model: ./ELYZA-japanese-Llama-2-13b
	positive_prompts: []
	tokenizer_source: model:./ELYZA-japanese-Llama-2-13b-instruct
	```