ryota39
/

RakutenAI-7B-instruct-reward

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

Edit model card

this model was trained to classify whether input text comes from "chosen sentence" or "rejected sentence"
the probability (logits after passing softmax function) in last layer of this model can be used to quantify the preference from user input
fine-tuned Rakuten/RakutenAI-7B-instruct via LoRA using open-preference-v0.3
trained on bf16 format

Metric

validation

accuracy	recall	precision	f1-score
0.9694	0.9757	0.9636	0.9696

test

accuracy	recall	precision	f1-score
0.5162	0.8822	0.5093	0.6458

confusion matrix
- x-axis shows ground truth
- y-axis shows prediction

Downloads last month: 35

Safetensors

Model size

7.37B params

Tensor type

BF16

·

Text Generation

This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Collection including ryota39/RakutenAI-7B-instruct-reward

Reward Model for Japanese

日本語データセットで報酬モデルを作る取り組み • 7 items • Updated 18 days ago