aashish1904 commited on
Commit
336e9fa
1 Parent(s): f8980e4

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +166 -0
README.md ADDED
@@ -0,0 +1,166 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
+ ---
3
+
4
+ thumbnail: https://github.com/rinnakk/japanese-pretrained-models/blob/master/rinna.png
5
+ license: gemma
6
+ language:
7
+ - ja
8
+ - en
9
+ tags:
10
+ - gemma2
11
+ - conversational
12
+ base_model:
13
+ - google/gemma-2-2b
14
+ - google/gemma-2-2b-it
15
+ - rinna/gemma-2-baku-2b
16
+ base_model_relation: merge
17
+ pipeline_tag: text-generation
18
+ library_name: transformers
19
+
20
+ ---
21
+
22
+ [![QuantFactory Banner](https://lh7-rt.googleusercontent.com/docsz/AD_4nXeiuCm7c8lEwEJuRey9kiVZsRn2W-b4pWlu3-X534V3YmVuVc2ZL-NXg2RkzSOOS2JXGHutDuyyNAUtdJI65jGTo8jT9Y99tMi4H4MqL44Uc5QKG77B0d6-JfIkZHFaUA71-RtjyYZWVIhqsNZcx8-OMaA?key=xt3VSDoCbmTY7o-cwwOFwQ)](https://hf.co/QuantFactory)
23
+
24
+
25
+ # QuantFactory/gemma-2-baku-2b-it-GGUF
26
+ This is quantized version of [rinna/gemma-2-baku-2b-it](https://huggingface.co/rinna/gemma-2-baku-2b-it) created using llama.cpp
27
+
28
+ # Original Model Card
29
+
30
+
31
+
32
+ # `Gemma 2 Baku 2B Instruct (rinna/gemma-2-baku-2b-it)`
33
+
34
+ ![rinna-icon](./rinna.png)
35
+
36
+ # Overview
37
+
38
+ The model is an instruction-tuned variant of [rinna/gemma-2-baku-2b](https://huggingface.co/rinna/gemma-2-baku-2b), utilizing Chat Vector and Odds Ratio Preference Optimization (ORPO) for fine-tuning. It adheres to the gemma-2 chat format.
39
+
40
+ | Size | Continual Pre-Training | Instruction-Tuning |
41
+ | :- | :- | :- |
42
+ | 2B | Gemma 2 Baku 2B [[HF]](https://huggingface.co/rinna/gemma-2-baku-2b) | Gemma 2 Baku 2B Instruct [[HF]](https://huggingface.co/rinna/gemma-2-baku-2b-it) |
43
+
44
+ * **Model architecture**
45
+
46
+ A 26-layer, 2304-hidden-size transformer-based language model. Please refer to the [Gemma 2 Model Card](https://www.kaggle.com/models/google/gemma-2/) for detailed information on the model's architecture.
47
+
48
+ * **Training**
49
+
50
+ **Model merging.** The base model was endowed with instruction-following capabilities through a chat vector addition process. The chat vector was derived by subtracting the parameter vectors of [google/gemma-2-2b](https://huggingface.co/google/gemma-2-2b) from [google/gemma-2-2b-it](https://huggingface.co/google/gemma-2-2b-it), as follows.
51
+
52
+ ~~~~text
53
+ rinna/gemma-2-baku-2b + 1.0 * (google/gemma-2-2b-it - google/gemma-2-2b)
54
+ ~~~~
55
+
56
+ During this process, the embedding layer was excluded during the subtraction and addition of parameter vectors.
57
+
58
+ **OPRO** was applied using a subset of the following dataset to further refine the performance of the merged model.
59
+
60
+ - rinna's internal dataset
61
+
62
+ * **Contributors**
63
+
64
+ - [Xinqi Chen](https://huggingface.co/Keely0419)
65
+ - [Toshiaki Wakatsuki](https://huggingface.co/t-w)
66
+ - [Kei Sawada](https://huggingface.co/keisawada)
67
+
68
+ ---
69
+
70
+ # Benchmarking
71
+
72
+ Please refer to [rinna's LM benchmark page](https://rinnakk.github.io/research/benchmarks/lm/index.html).
73
+
74
+ ---
75
+
76
+ # How to use the model
77
+
78
+ ~~~~python
79
+ from transformers import AutoTokenizer, AutoModelForCausalLM
80
+ import torch
81
+
82
+ model_id = "rinna/gemma-2-baku-2b-it"
83
+ dtype = torch.bfloat16
84
+
85
+ tokenizer = AutoTokenizer.from_pretrained(model_id)
86
+ model = AutoModelForCausalLM.from_pretrained(
87
+ model_id,
88
+ device_map="cuda",
89
+ torch_dtype=dtype,
90
+ attn_implementation="eager",
91
+ )
92
+
93
+ chat = [
94
+ { "role": "user", "content": "西田幾多郎とはどんな人物ですか?" },
95
+ ]
96
+ prompt = tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=True)
97
+
98
+ input_ids = tokenizer.encode(prompt, add_special_tokens=False, return_tensors="pt").to(model.device)
99
+ outputs = model.generate(
100
+ input_ids,
101
+ max_new_tokens=512,
102
+ )
103
+
104
+ response = tokenizer.decode(outputs[0][input_ids.shape[-1]:], skip_special_tokens=True)
105
+ print(response)
106
+ ~~~~
107
+
108
+ It is recommended to use eager attention when conducting batch inference under bfloat16 precision.
109
+ Currently, Gemma 2 yields NaN values for input sequences with padding when the default attention mechanism (torch.scaled_dot_product_attention) is employed in conjunction with bfloat16.
110
+
111
+ ---
112
+
113
+ # Tokenization
114
+ The model uses the original [google/gemma-2-2b-it](https://huggingface.co/google/gemma-2-2b-it) tokenizer.
115
+
116
+ ---
117
+
118
+ # How to cite
119
+ ```bibtex
120
+ @misc{rinna-gemma-2-baku-2b-it,
121
+ title = {rinna/gemma-2-baku-2b-it},
122
+ author = {Chen, Xinqi and Wakatsuki, Toshiaki and Sawada, Kei},
123
+ url = {https://huggingface.co/rinna/gemma-2-baku-2b-it}
124
+ }
125
+
126
+ @inproceedings{sawada2024release,
127
+ title = {Release of Pre-Trained Models for the {J}apanese Language},
128
+ author = {Sawada, Kei and Zhao, Tianyu and Shing, Makoto and Mitsui, Kentaro and Kaga, Akio and Hono, Yukiya and Wakatsuki, Toshiaki and Mitsuda, Koh},
129
+ booktitle = {Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)},
130
+ month = {5},
131
+ year = {2024},
132
+ pages = {13898--13905},
133
+ url = {https://aclanthology.org/2024.lrec-main.1213},
134
+ note = {\url{https://arxiv.org/abs/2404.01657}}
135
+ }
136
+ ```
137
+ ---
138
+
139
+ # References
140
+ ```bibtex
141
+ @article{gemma-2-2024,
142
+ title = {Gemma 2},
143
+ url = {https://www.kaggle.com/models/google/gemma-2},
144
+ publisher = {Kaggle},
145
+ author = {Gemma Team},
146
+ year = {2024}
147
+ }
148
+
149
+ @article{huang2023chat,
150
+ title = {Chat Vector: A Simple Approach to Equip LLMs with Instruction Following and Model Alignment in New Languages},
151
+ author = {Huang, Shih-Cheng and Li, Pin-Zu and Hsu, Yu-Chi and Chen, Kuang-Ming and Lin, Yu Tung and Hsiao, Shih-Kai and Tzong-Han Tsai, Richard and Lee, Hung-yi},
152
+ year = {2023},
153
+ url = {https://arxiv.org/abs/2310.04799}
154
+ }
155
+
156
+ @article{hong2024orpo,
157
+ title = {ORPO: Monolithic Preference Optimization without Reference Model},
158
+ author = {Hong, Jiwoo and Lee, Noah and Thorne, James},
159
+ year = {2024},
160
+ url = {https://arxiv.org/abs/2403.07691}
161
+ }
162
+ ```
163
+ ---
164
+
165
+ # License
166
+ [Gemma Terms of Use](https://ai.google.dev/gemma/terms)