metadata

license: llama2
language:
  - ja
tags:
  - moe

youri-2x7b_dev

This model is a Mixture of Experts (MoE) merger of the following two models:

🏆 Evaluation

All scores for these benchmarks have been evaluated using the Stability-AI/lm-evaluation-harness. The results of the benchmark scores are stored in benchmark_scores. For detailed information on the scores and the conditions under which they were obtained, please refer to this link.

Model	JCommonsenseQA(3-shot,acc.)	JNLI(3-shot,balanced acc.)	MARC-ja(0-shot,balanced acc.)	JSQuAD(2-shot,F1)	4-AVERAGE
youri-2x7b_dev	91.15	71.03	95.90	91.30	87.34
youri-7b-instruction *1	88.83	63.56	93.78	92.19	84.59
youri-7b-chat *1	91.78	70.35	96.69	79.62	84.61

Model	jaqket-v2(1-shot,F1)	xlsum(1-shot,ROUGE 2) *2	6-AVERAGE
youri-2x7b_dev	84.59	25.62	76.59
youri-7b-instruction *1	83.92	24.67	75.13
youri-7b-chat *1	83.71	24.21	75.33

Model	xwinograd(0-shot,acc.) *2	mgsm(5-shot,acc.) *2	JCoLA(2-shot,balanced acc.) *2	9-AVERAGE
youri-2x7b_dev	81.43	24.80	59.09	69.43
youri-7b-instruction *1	78.94	17.20	54.04	66.35
youri-7b-chat *1	80.92	25.20	53.78	67.36

*1 From the rinna's LM Benchmark.
*2 Since there was no mention of these template versions in rinna's LM Benchmark, the scores were calculated without specifying a template.

🧩 Configuration

The model has been made with a custom version of the mergekit library (mixtral branch) and the following configuration:

base_model: rinna/youri-7b-chat
gate_mode: hidden # one of "hidden", "cheap_embed", or "random"
dtype: bfloat16 # output dtype (float32, float16, or bfloat16)
experts:
  - source_model: rinna/youri-7b-chat
    positive_prompts: 
      - "質問と回答の選択肢を入力として受け取り、選択肢から回答を選択してください。"
      - "前提と仮説の関係を含意、矛盾、中立の中から回答してください。"
      - "以下のテキストを、ポジティブまたはネガティブの感情クラスのいずれかに分類してください。"
      - "以下は、タスクを説明する指示と、文脈のある入力の組み合わせです。要求を適切に満たす応答を書きなさい。"
  - source_model: rinna/youri-7b-instruction
    positive_prompts: 
     - "質問に対する回答を題名と文章から一言で抽出してください。回答は名詞で答えてください。"
     - "与えられたニュース記事を要約してください。"
     - "与えられた文が文法的であるかを回答してください。"

The positive_prompts in the above configuration are extracted from the instructions of benchmarks that each model excels in. For reference on the benchmarks for each model, please see the LM Benchmark at rinna's LM Benchmark. These benchmarks provide a detailed overview of the areas where each individual model performs particularly well, guiding the effective use of the merged model in various natural language processing tasks.

💻 Usage

!pip install -q --upgrade transformers einops accelerate bitsandbytes

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "HachiML/youri-2x7b_dev"
torch.set_default_device("cuda")

# Load the model and tokenizer
model = AutoModelForCausalLM.from_pretrained(
    model_name, 
    torch_dtype="auto", 
    load_in_4bit=True, 
    trust_remote_code=True
)
tokenizer = AutoTokenizer.from_pretrained(
    model_name, 
    trust_remote_code=True
)

torch.set_default_device("cuda")

# Create input
instruction = "次の日本語を英語に翻訳してください。"
input = "大規模言語モデル（だいきぼげんごモデル、英: large language model、LLM）は、多数のパラメータ（数千万から数十億）を持つ人工ニューラルネットワークで構成されるコンピュータ言語モデルで、膨大なラベルなしテキストを使用して自己教師あり学習または半教師あり学習によって訓練が行われる。"
prompt = f"""
以下は、タスクを説明する指示と、文脈のある入力の組み合わせです。要求を適切に満たす応答を書きなさい。

### 指示:
{instruction}

### 入力:
{input}

### 応答:
"""

# Tokenize the input string
token_ids = tokenizer.encode(prompt, add_special_tokens=False, return_tensors="pt")

# Generate text using the model
with torch.no_grad():
    output_ids = model.generate(
        token_ids.to(model.device),
        max_new_tokens=200,
        do_sample=True,
        temperature=0.5,
        pad_token_id=tokenizer.pad_token_id,
        bos_token_id=tokenizer.bos_token_id,
        eos_token_id=tokenizer.eos_token_id
    )

# Decode and print the output
output = tokenizer.decode(output_ids.tolist()[0])
print(output)