cyberagent
/

calm2-7b-chat-dpo-experimental

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

calm2-7b-chat-dpo-experimental / README.md

ddyuudd's picture

Update README.md

d55a77e verified 6 months ago

|

No virus

2.91 kB

	---
	license: cc-by-4.0
	datasets:
	- cyberagent/chatbot-arena-ja-calm2-7b-chat-experimental
	language:
	- ja
	- en
	---

	# Model Card for "calm2-7b-chat-dpo-experimental"

	[cyberagent/calm2-7b-chat](https://huggingface.co/cyberagent/calm2-7b-chat)に[cyberagent/chatbot-arena-ja-calm2-7b-chat-experimental](https://huggingface.co/datasets/cyberagent/chatbot-arena-ja-calm2-7b-chat-experimental)データセットを用いて[Direct Preference Optimization (DPO)](https://arxiv.org/abs/2305.18290)をしたモデルです。
	DPOには[Low-Rank Adaptation (LoRA)](https://huggingface.co/docs/peft/conceptual_guides/lora)を用いました。

	## Requirements, Usage, Chat Template

	[cyberagent/calm2-7b-chat](https://huggingface.co/cyberagent/calm2-7b-chat)と同様です。
	同様のコード・プロンプトで動かすことができます。

	```python
	import transformers
	from transformers import AutoModelForCausalLM, AutoTokenizer, TextStreamer

	assert transformers.__version__ >= "4.34.1"

	model = AutoModelForCausalLM.from_pretrained("cyberagent/calm2-7b-chat-dpo-experimental", device_map="auto", torch_dtype="auto")
	tokenizer = AutoTokenizer.from_pretrained("cyberagent/calm2-7b-chat-dpo-experimental")
	streamer = TextStreamer(tokenizer, skip_prompt=True, skip_special_tokens=True)

	prompt = """USER: AIによって私達の暮らしはどのように変わりますか？
	ASSISTANT: """

	token_ids = tokenizer.encode(prompt, return_tensors="pt")
	output_ids = model.generate(
	input_ids=token_ids.to(model.device),
	max_new_tokens=300,
	do_sample=True,
	temperature=0.8,
	streamer=streamer,
	)
	```

	## 実験結果

	### ELYZA-tasks-100 (GPT-4 eval)

	実験結果のランダム性を避けるため、greedy searchで出力しました。

	\| calm2-7b-chat \| calm2-7b-chat-dpo \|
	\| ---- \| ---- \|
	\| 2.67 \| 2.85 \|


	### Japanese MT-Bench

	以下の文をシステムプロンプト（system_message）としてcalm2-7b-chat-dpoとcalm2-7b-chatの評価を行いました。

	"以下は、タスクを説明する指示と、文脈のある入力の組み合わせです。要求を適切に満たす応答を書きなさい。"

	このシステムプロンプトは[stabilityai/japanese-stablelm-instruct-alpha-7bを評価するときに使われるもの](https://github.com/Stability-AI/FastChat/blob/dfb653d2cadd16017b66bbc3a25cf361031f2da3/fastchat/conversation.py#L364)をそのまま使いました。
	他のデコーディングパラメータはデフォルトのままです（ランダム性があります）。

	\| \| calm2-7b-chat \| calm2-7b-chat-dpo \|
	\| ---- \| ---- \| ---- \|
	\| 平均 \| 6.1 \| 6.7 \|
	\| extraction \| 4.1 \| 5.4 \|
	\| humanities \| 8.2 \| 8.4 \|
	\| reasoning \| 3.9 \| 4.3 \|
	\| roleplay \| 6.4 \| 7.0 \|
	\| stem \| 6.3 \| 6.2 \|
	\| writing \| 7.7 \| 9.1 \|

	## Releases

	1.0: v1 release (Jan 24, 2024)

	## Author

	Yuu Jinnai (jinnai_yu@cyberagent.co.jp), Standing on the shoulders of giants