Uploaded model

Developed by: aolans
License: gemma
Finetuned from model : google/gemma-2-9b

Model

「日本語reasoningモデルを作る」を参考に

CoTデータでファインチューニングすることで作成したreasoningモデルです。

・目的

ELYZA-tasks-100 （の亜種）に対する精度アップを目的としています。
（実用的ではないかもしれません。）

・親モデル

Google/gemma-2-9b に対してUnslothでファインチューニング

Unsloth利用時に Unsloth/Gemma-2-9b(4bit量子化版) が適用されてしまうため
ローカルにGemma-2-9Bをダウンロードし、そちらをベースにしています。

・データセット

まず、日本語学習の為、以下データセットでSFT実施。

CohereForAI/aya_dataset　　（※英語と日本語のデータのみ）
Kendamarron/jimba-instuction-1k-beta　　（※長文出力の為）

次にCoT対応の為、以下データセットでSFT実施。
difficulty =「very easy」or「easy」と、「medium」の一部を使用しています。

Kendamarron/Magpie-Tanuki-8B-CoT

・参考資料

Usage

!pip install unsloth
!pip uninstall unsloth -y && pip install --upgrade --no-cache-dir "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"
!pip install -U torch

from unsloth import FastLanguageModel
import torch
from huggingface_hub import hf_hub_download
import importlib.util

model_name = "aolans/gemma-2-9b-it-2e-cot"

# *** モデル・トークナイザ生成（Unsloth使用）***
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name=model_name,
    dtype=None,
    load_in_4bit=True,
    trust_remote_code=True,
)

# *** 推論するためにモデルのモードを変更 ***
FastLanguageModel.for_inference(model)

# *** カスタム関数導入 ***
file_path = hf_hub_download(
    repo_id=model_name,
    filename="custom_functions.py",
)
spec = importlib.util.spec_from_file_location("custom_functions", file_path)
custom_functions = importlib.util.module_from_spec(spec)
spec.loader.exec_module(custom_functions)

# 質問内容
#   ※ユーザーからの質問内容をそのまま指定してください。
#     下記カスタム関数でシステムプロンプトを生成します。 
input = "1から10までの整数を足すと？"

# *** CoTプロンプト（システムプロンプト）生成 ***
prompt = custom_functions.add_system_prompt( input )

# *** 推論 ***
inputs = tokenizer([prompt], return_tensors = "pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=1024, use_cache=True, do_sample=False, repetition_penalty=1.2)
prediction = tokenizer.decode(outputs[0], skip_special_tokens=True)

# *** 出力結果の抽出 ***
# 戻り値：
#   output 　応答
#   thought  思考の過程 
output, thought = custom_functions.extract_output( prediction )

# print(thought)
print(output)

This gemma2 model was trained 2x faster with Unsloth and Huggingface's TRL library.

aolans
/

gemma-2-9b-it-2e-cot