Model Card for llm-jp-13b-instruct-full-jaster-dpo

This is a human preference optimized version of the native Japanese model llm-jp/llm-jp-13b-instruct-full-jaster-v1.0.

Model Details

Model type: transformer-based large language model

Total tokens seen: 300B

Parameters: 13B

Layers: 40

Hidden size: 5120

Heads: 40

Context length: 2048

Training

Pre-training:

Hardware: 96 A100 40GB GPUs (MDX cluster)

Software: Megatron-DeepSpeed

Instruction tuning:

Hardware: 8 A100 40GB GPUs (MDX cluster)

Software: TRL, PEFT, and DeepSpeed

Human Preference Alignment:

Hardware: Apple MPS device, M3 Max chip, 16-core CPU, 16-core neural engine, 40-core GPU / 128G unified memory

Software: PyTorch (on MPS), HugginFace Transformers, PEFT (version 0.8.2)

Tokenizer

The tokenizer of this model is based on huggingface/tokenizers unigram byte-fallback model. The vocabulary entries were converted from llm-jp-tokenizer v2.1 (50k). Please refer to README.md of llm-ja-tokenizer for details on the vocabulary construction procedure.

Model: Hugging Face Fast Tokenizer using Unigram byte-fallback model which requires tokenizers>=0.14.0
Training algorithm: SentencePiece Unigram byte-fallback
Training data: a subset of the datasets for model pre-training
Vocabulary size: 50,570 (mixed vocabulary of Japanese, English, and source code)

Model Description

This model was aligned with human preferences using an adapter approach from the PEFT library (https://github.com/huggingface/peft). The alignment was based on Direct Preference Optimization (https://arxiv.org/abs/2305.18290).

Training Data

The data used for DPO was a Japanese translation of of the original Anthropic Helpful-Harmless dataset (https://huggingface.co/datasets/Anthropic/hh-rlhf) for Reinforcement Learning from Human Feedback (https://arxiv.org/abs/2204.05862). The translation is available here: https://huggingface.co/datasets/shi3z/anthropic_hh_rlhf_japanese

Direct Use

from peft import AutoPeftModelForCausalLM
from transformers import AutoTokenizer

model_name = "llmjp/llm-jp-13b-instruct-full-jaster-dpo"

model = AutoPeftModelForCausalLM.from_pretrained(
    model_name,
    low_cpu_mem_usage=True,
    torch_dtype=torch.float16,
    load_in_4bit=True,
)

tokenizer = AutoTokenizer.from_pretrained(model_name)

inputs = tokenizer.encode("質問：日本の首都はどこですか?\n\n答え：", return_tensors="pt")
outputs = model.generate(inputs)
print(tokenizer.decode(outputs[0]))

Author

Stephen Fitz (https://huggingface.co/stephenfitz) for LLMJP (https://huggingface.co/llm-jp)