`Gemma 2 Baku 2B Instruct (rinna/gemma-2-baku-2b-it)`

Overview

The model is an instruction-tuned variant of rinna/gemma-2-baku-2b, utilizing Chat Vector and Odds Ratio Preference Optimization (ORPO) for fine-tuning. It adheres to the gemma-2 chat format.

Size	Continual Pre-Training	Instruction-Tuning
2B	Gemma 2 Baku 2B [HF]	Gemma 2 Baku 2B Instruct [HF]

Model architecture

A 26-layer, 2304-hidden-size transformer-based language model. Please refer to the Gemma 2 Model Card for detailed information on the model's architecture.
Training

Model merging. The base model was endowed with instruction-following capabilities through a chat vector addition process. The chat vector was derived by subtracting the parameter vectors of google/gemma-2-2b from google/gemma-2-2b-it, as follows.
```
  rinna/gemma-2-baku-2b + 1.0 * (google/gemma-2-2b-it - google/gemma-2-2b)
```
During this process, the embedding layer was excluded during the subtraction and addition of parameter vectors.

ORPO was applied using a subset of the following dataset to further refine the performance of the merged model.
- rinna's internal dataset
Contributors
Release date

October 3, 2024

Benchmarking

Please refer to rinna's LM benchmark page (Sheet 20241003).

How to use the model

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_id = "rinna/gemma-2-baku-2b-it"
dtype = torch.bfloat16

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    device_map="cuda",
    torch_dtype=dtype,
    attn_implementation="eager",
)

chat = [
    { "role": "user", "content": "西田幾多郎とはどんな人物ですか？" },
]
prompt = tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=True)

input_ids = tokenizer.encode(prompt, add_special_tokens=False, return_tensors="pt").to(model.device)
outputs = model.generate(
    input_ids,
    max_new_tokens=512,
)

response = tokenizer.decode(outputs[0][input_ids.shape[-1]:], skip_special_tokens=True)
print(response)

It is recommended to use eager attention when conducting batch inference under bfloat16 precision. Currently, Gemma 2 yields NaN values for input sequences with padding when the default attention mechanism (torch.scaled_dot_product_attention) is employed in conjunction with bfloat16.

Tokenization

The model uses the original google/gemma-2-2b-it tokenizer.

How to cite

@misc{rinna-gemma-2-baku-2b-it,
    title = {rinna/gemma-2-baku-2b-it},
    author = {Chen, Xinqi and Wakatsuki, Toshiaki and Sawada, Kei},
    url = {https://huggingface.co/rinna/gemma-2-baku-2b-it}
}

@inproceedings{sawada2024release,
    title = {Release of Pre-Trained Models for the {J}apanese Language},
    author = {Sawada, Kei and Zhao, Tianyu and Shing, Makoto and Mitsui, Kentaro and Kaga, Akio and Hono, Yukiya and Wakatsuki, Toshiaki and Mitsuda, Koh},
    booktitle = {Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)},
    month = {5},
    year = {2024},
    pages = {13898--13905},
    url = {https://aclanthology.org/2024.lrec-main.1213},
    note = {\url{https://arxiv.org/abs/2404.01657}}
}

References

@article{gemma-2-2024,
    title = {Gemma 2},
    url = {https://www.kaggle.com/models/google/gemma-2},
    publisher = {Kaggle},
    author = {Gemma Team},
    year = {2024}
}

@article{huang2023chat,
    title = {Chat Vector: A Simple Approach to Equip LLMs with Instruction Following and Model Alignment in New Languages},
    author = {Huang, Shih-Cheng and Li, Pin-Zu and Hsu, Yu-Chi and Chen, Kuang-Ming and Lin, Yu Tung and Hsiao, Shih-Kai and Tzong-Han Tsai, Richard and Lee, Hung-yi},
    year = {2023},
    url = {https://arxiv.org/abs/2310.04799}
}

@article{hong2024orpo,
  title = {ORPO: Monolithic Preference Optimization without Reference Model},
  author = {Hong, Jiwoo and Lee, Noah and Thorne, James},
  year = {2024},
  url = {https://arxiv.org/abs/2403.07691}
}

License

Gemma Terms of Use

rinna
/

gemma-2-baku-2b-it

`Gemma 2 Baku 2B Instruct (rinna/gemma-2-baku-2b-it)`

Overview

Benchmarking

How to use the model

Tokenization

How to cite

References

License

Model tree for rinna/gemma-2-baku-2b-it

Space using rinna/gemma-2-baku-2b-it 1

Collection including rinna/gemma-2-baku-2b-it

gemma-2-baku