Gregor commited on
Commit
004e9e8
1 Parent(s): 2242dff

Upload folder using huggingface_hub

Browse files
.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ tokenizer.json filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,227 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ - multilingual
5
+ license: mit
6
+ tags:
7
+ - vision
8
+ - image-to-text
9
+ - image-captioning
10
+ - visual-question-answering
11
+ pipeline_tag: image-to-text
12
+ inference: false
13
+ datasets:
14
+ - Gregor/mblip-train
15
+ ---
16
+
17
+ # mBLIP BLOOMZ-7B
18
+
19
+ This is the model checkpoint for our work [mBLIP: Efficient Bootstrapping of Multilingual Vision-LLMs](https://arxiv.org/abs/2307.06930).
20
+
21
+
22
+
23
+ ## Model description
24
+ mBLIP is a [BLIP-2](https://arxiv.org/abs/2301.12597) model which consists of 3 sub-models: a Vision Transformer (ViT), a Query-Transformer (Q-Former) and a large language model (LLM).
25
+
26
+ The Q-Former and ViT have both been initialized by an English BLIP-2 checkpoint ([blip2-flan-t5-xl](https://huggingface.co/Gregor/mblip-bloomz-7b)) and then re-aligned
27
+ to the multilingual LLM ([bloomz-7b1](https://huggingface.co/bigscience/bloomz-7b1)) using a [multilingual task mixture](https://huggingface.co/datasets/Gregor/mblip-train).
28
+
29
+ <img src="https://github.com/gregor-ge/mBLIP/blob/main/architecture.png"
30
+ alt="The mBLIP architecture" width="600"/>
31
+
32
+ This allows the model to be used for tasks like:
33
+
34
+ - image captioning
35
+ - visual question answering (VQA)
36
+
37
+ in 96 languages.
38
+
39
+ #### Languages
40
+ mBLIP was trained on the following 96 languages:
41
+
42
+ `
43
+ af, am, ar, az, be, bg, bn, ca, ceb, cs, cy, da, de, el, en, eo, es, et, eu, fa, fi, fil, fr, ga, gd, gl, gu, ha, hi, ht, hu, hy, id, ig, is, it, iw, ja, jv, ka, kk, km, kn, ko, ku, ky, lb, lo, lt, lv, mg, mi, mk, ml, mn, mr, ms, mt, my, ne, nl, no, ny, pa, pl, ps, pt, ro, ru, sd, si, sk, sl, sm, sn, so, sq, sr, st, su, sv, sw, ta, te, tg, th, tr, uk, ur, uz, vi, xh, yi, yo, zh, zu
44
+ `
45
+
46
+
47
+ ## Direct Use and Downstream Use
48
+
49
+ You can use the raw model for conditional text generation given an image and prompt text in a zero-shot setup or
50
+ alternatively finetune it for downstream applications.
51
+ We strongly recommend LoRA applied to the LLM when finetuning and to use bf16 as data type - standard fp16 can cause NaN loss.
52
+
53
+ See [our repository](https://github.com/gregor-ge/mBLIP) for the code used to train and finetune this model.
54
+
55
+ When using batched input, use left padding!
56
+
57
+
58
+ ## Bias, Risks, Limitations, and Ethical Considerations
59
+
60
+ While mBLIP can work in theory with up to 100 languages, in practice, we expect best results when prompted in high-resource languages
61
+ like English, German, Spanish, etc.
62
+
63
+
64
+
65
+ mBLIP inherits the risk, limitations, and biases from the models used to initialize it.
66
+ mBLIP has not been tested in real world applications. It should not be directly deployed in any applications. Researchers should first carefully assess the safety and fairness of the model in relation to the specific context they’re being deployed within.
67
+
68
+ ### How to use
69
+
70
+ For code examples, we refer to the BLIP-2 [documentation](https://huggingface.co/docs/transformers/main/en/model_doc/blip-2#transformers.Blip2ForConditionalGeneration.forward.example).
71
+
72
+ #### Running the model on CPU
73
+
74
+ <details>
75
+ <summary> Click to expand </summary>
76
+
77
+ ```python
78
+ import requests
79
+ from PIL import Image
80
+ from transformers import BlipProcessor, Blip2ForConditionalGeneration
81
+
82
+ processor = BlipProcessor.from_pretrained("Gregor/mblip-bloomz-7b")
83
+ model = Blip2ForConditionalGeneration.from_pretrained("Gregor/mblip-bloomz-7b")
84
+
85
+ img_url = 'https://storage.googleapis.com/sfr-vision-language-research/BLIP/demo.jpg'
86
+ raw_image = Image.open(requests.get(img_url, stream=True).raw).convert('RGB')
87
+
88
+ question = "Describe the image in German."
89
+ inputs = processor(raw_image, question, return_tensors="pt")
90
+
91
+ out = model.generate(**inputs)
92
+ print(processor.decode(out[0], skip_special_tokens=True))
93
+ ```
94
+ </details>
95
+
96
+ #### Running the model on GPU
97
+
98
+ ##### In full precision
99
+
100
+ <details>
101
+ <summary> Click to expand </summary>
102
+
103
+ ```python
104
+ # pip install accelerate
105
+ import requests
106
+ from PIL import Image
107
+ from transformers import Blip2Processor, Blip2ForConditionalGeneration
108
+
109
+ processor = Blip2Processor.from_pretrained("Gregor/mblip-bloomz-7b")
110
+ model = Blip2ForConditionalGeneration.from_pretrained("Gregor/mblip-bloomz-7b", device_map="auto")
111
+
112
+ img_url = 'https://storage.googleapis.com/sfr-vision-language-research/BLIP/demo.jpg'
113
+ raw_image = Image.open(requests.get(img_url, stream=True).raw).convert('RGB')
114
+
115
+ question = "Describe the image in German."
116
+ inputs = processor(raw_image, question, return_tensors="pt").to("cuda")
117
+
118
+ out = model.generate(**inputs)
119
+ print(processor.decode(out[0], skip_special_tokens=True))
120
+ ```
121
+ </details>
122
+
123
+ ##### In half precision (`bfloat16`)
124
+
125
+ <details>
126
+ <summary> Click to expand </summary>
127
+
128
+ ```python
129
+ # pip install accelerate
130
+ import torch
131
+ import requests
132
+ from PIL import Image
133
+ from transformers import Blip2Processor, Blip2ForConditionalGeneration
134
+
135
+ processor = Blip2Processor.from_pretrained("Gregor/mblip-bloomz-7b")
136
+ model = Blip2ForConditionalGeneration.from_pretrained("Gregor/mblip-bloomz-7b", torch_dtype=torch.bfloat16, device_map="auto")
137
+
138
+ img_url = 'https://storage.googleapis.com/sfr-vision-language-research/BLIP/demo.jpg'
139
+ raw_image = Image.open(requests.get(img_url, stream=True).raw).convert('RGB')
140
+
141
+ question = "Describe the image in German."
142
+ inputs = processor(raw_image, question, return_tensors="pt").to("cuda", torch.bfloat16)
143
+
144
+ out = model.generate(**inputs)
145
+ print(processor.decode(out[0], skip_special_tokens=True))
146
+ ```
147
+ </details>
148
+
149
+ ##### In 8-bit precision (`int8`)
150
+ >**Important:** Paper results only use int8 for the LLM weights while this loads all weights in int8.
151
+ > We see that this gives slightly worse results but currently int8 for only some model parts is not supported by HuggingFace.
152
+
153
+ <details>
154
+ <summary> Click to expand </summary>
155
+
156
+ ```python
157
+ # pip install accelerate bitsandbytes
158
+ import torch
159
+ import requests
160
+ from PIL import Image
161
+ from transformers import Blip2Processor, Blip2ForConditionalGeneration
162
+
163
+ processor = Blip2Processor.from_pretrained("Gregor/mblip-bloomz-7b")
164
+ model = Blip2ForConditionalGeneration.from_pretrained("Gregor/mblip-bloomz-7b", load_in_8bit=True, device_map="auto")
165
+
166
+ img_url = 'https://storage.googleapis.com/sfr-vision-language-research/BLIP/demo.jpg'
167
+ raw_image = Image.open(requests.get(img_url, stream=True).raw).convert('RGB')
168
+
169
+ question = "Describe the image in German."
170
+ inputs = processor(raw_image, question, return_tensors="pt").to("cuda", torch.bfloat16)
171
+
172
+ out = model.generate(**inputs)
173
+ print(processor.decode(out[0], skip_special_tokens=True))
174
+ ```
175
+ </details>
176
+
177
+
178
+ ##### In 4-bit precision (`int4`)
179
+ >**Important:** Paper results only use int4 for the LLM weights while this loads all weights in int8.
180
+ > We see that this gives slightly worse results but currently int4 for only some model parts is not supported by HuggingFace.
181
+
182
+ <details>
183
+ <summary> Click to expand </summary>
184
+
185
+ ```python
186
+ # pip install accelerate bitsandbytes
187
+ import torch
188
+ import requests
189
+ from PIL import Image
190
+ from transformers import Blip2Processor, Blip2ForConditionalGeneration
191
+
192
+ processor = Blip2Processor.from_pretrained("Gregor/mblip-bloomz-7b")
193
+ model = Blip2ForConditionalGeneration.from_pretrained("Gregor/mblip-bloomz-7b",
194
+ load_in_4bit=True,
195
+ bnb_4bit_quant_type="nf4",
196
+ bnb_4bit_use_double_quant=False,
197
+ bnb_4bit_compute_dtype=torch.bfloat16,
198
+ device_map="auto")
199
+
200
+ img_url = 'https://storage.googleapis.com/sfr-vision-language-research/BLIP/demo.jpg'
201
+ raw_image = Image.open(requests.get(img_url, stream=True).raw).convert('RGB')
202
+
203
+ question = "Describe the image in German."
204
+ inputs = processor(raw_image, question, return_tensors="pt").to("cuda", torch.bfloat16)
205
+
206
+ out = model.generate(**inputs)
207
+ print(processor.decode(out[0], skip_special_tokens=True))
208
+ ```
209
+ </details>
210
+
211
+ ## Citation
212
+ If you use our model, please cite the following:
213
+ ```
214
+ @article{geigle2023mblip,
215
+ author = {Gregor Geigle and
216
+ Abhay Jain and
217
+ Radu Timofte and
218
+ Goran Glava\v{s}},
219
+ title = {mBLIP: Efficient Bootstrapping of Multilingual Vision-LLMs},
220
+ journal = {arXiv},
221
+ volume = {abs/2307.06930},
222
+ year = {2023},
223
+ url = {https://arxiv.org/abs/2307.06930},
224
+ eprinttype = {arXiv},
225
+ eprint = {2307.06930},
226
+ }
227
+ ```
config.json ADDED
@@ -0,0 +1,260 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_commit_hash": "cc2bb7bce2f7d4d1c37753c7e9c05a443a226614",
3
+ "architectures": [
4
+ "mBLIP"
5
+ ],
6
+ "initializer_factor": 1.0,
7
+ "initializer_range": 0.02,
8
+ "is_encoder_decoder": true,
9
+ "model_type": "blip-2",
10
+ "num_query_tokens": 32,
11
+ "qformer_config": {
12
+ "_name_or_path": "",
13
+ "add_cross_attention": false,
14
+ "architectures": null,
15
+ "attention_probs_dropout_prob": 0.1,
16
+ "bad_words_ids": null,
17
+ "begin_suppress_tokens": null,
18
+ "bos_token_id": null,
19
+ "chunk_size_feed_forward": 0,
20
+ "classifier_dropout": null,
21
+ "cross_attention_frequency": 2,
22
+ "cross_attention_hidden_size": null,
23
+ "decoder_start_token_id": null,
24
+ "diversity_penalty": 0.0,
25
+ "do_sample": false,
26
+ "early_stopping": false,
27
+ "encoder_hidden_size": 1408,
28
+ "encoder_no_repeat_ngram_size": 0,
29
+ "eos_token_id": null,
30
+ "exponential_decay_length_penalty": null,
31
+ "finetuning_task": null,
32
+ "forced_bos_token_id": null,
33
+ "forced_eos_token_id": null,
34
+ "hidden_act": "gelu",
35
+ "hidden_dropout_prob": 0.1,
36
+ "hidden_size": 768,
37
+ "id2label": {
38
+ "0": "LABEL_0",
39
+ "1": "LABEL_1"
40
+ },
41
+ "initializer_range": 0.02,
42
+ "intermediate_size": 3072,
43
+ "is_decoder": false,
44
+ "is_encoder_decoder": false,
45
+ "label2id": {
46
+ "LABEL_0": 0,
47
+ "LABEL_1": 1
48
+ },
49
+ "layer_norm_eps": 1e-12,
50
+ "length_penalty": 1.0,
51
+ "max_length": 20,
52
+ "max_position_embeddings": 512,
53
+ "min_length": 0,
54
+ "model_type": "blip_2_qformer",
55
+ "no_repeat_ngram_size": 0,
56
+ "num_attention_heads": 12,
57
+ "num_beam_groups": 1,
58
+ "num_beams": 1,
59
+ "num_hidden_layers": 12,
60
+ "num_return_sequences": 1,
61
+ "output_attentions": false,
62
+ "output_hidden_states": false,
63
+ "output_scores": false,
64
+ "pad_token_id": 0,
65
+ "position_embedding_type": "absolute",
66
+ "prefix": null,
67
+ "problem_type": null,
68
+ "pruned_heads": {},
69
+ "remove_invalid_values": false,
70
+ "repetition_penalty": 1.0,
71
+ "return_dict": true,
72
+ "return_dict_in_generate": false,
73
+ "sep_token_id": null,
74
+ "suppress_tokens": null,
75
+ "task_specific_params": null,
76
+ "temperature": 1.0,
77
+ "tf_legacy_loss": false,
78
+ "tie_encoder_decoder": false,
79
+ "tie_word_embeddings": true,
80
+ "tokenizer_class": null,
81
+ "top_k": 50,
82
+ "top_p": 1.0,
83
+ "torch_dtype": null,
84
+ "torchscript": false,
85
+ "transformers_version": "4.31.0",
86
+ "typical_p": 1.0,
87
+ "use_bfloat16": false,
88
+ "vocab_size": 30522
89
+ },
90
+ "text_config": {
91
+ "_name_or_path": "/media/gregor/DATA/projects/wuerzburg/mblip/checkpoints/bloomz/08_03_2023_04_31_11-1-79283",
92
+ "add_cross_attention": false,
93
+ "apply_residual_connection_post_layernorm": false,
94
+ "architectures": [
95
+ "BloomForCausalLM"
96
+ ],
97
+ "attention_dropout": 0.0,
98
+ "attention_softmax_in_fp32": true,
99
+ "bad_words_ids": null,
100
+ "begin_suppress_tokens": null,
101
+ "bias_dropout_fusion": true,
102
+ "bos_token_id": 1,
103
+ "chunk_size_feed_forward": 0,
104
+ "cross_attention_hidden_size": null,
105
+ "decoder_start_token_id": null,
106
+ "diversity_penalty": 0.0,
107
+ "do_sample": false,
108
+ "early_stopping": false,
109
+ "encoder_no_repeat_ngram_size": 0,
110
+ "eos_token_id": 2,
111
+ "exponential_decay_length_penalty": null,
112
+ "finetuning_task": null,
113
+ "forced_bos_token_id": null,
114
+ "forced_eos_token_id": null,
115
+ "hidden_dropout": 0.0,
116
+ "hidden_size": 4096,
117
+ "id2label": {
118
+ "0": "LABEL_0",
119
+ "1": "LABEL_1"
120
+ },
121
+ "initializer_range": 0.02,
122
+ "is_decoder": false,
123
+ "is_encoder_decoder": false,
124
+ "label2id": {
125
+ "LABEL_0": 0,
126
+ "LABEL_1": 1
127
+ },
128
+ "layer_norm_epsilon": 1e-05,
129
+ "length_penalty": 1.0,
130
+ "masked_softmax_fusion": true,
131
+ "max_length": 20,
132
+ "min_length": 0,
133
+ "model_type": "bloom",
134
+ "n_head": 32,
135
+ "n_inner": null,
136
+ "n_layer": 30,
137
+ "no_repeat_ngram_size": 0,
138
+ "num_beam_groups": 1,
139
+ "num_beams": 1,
140
+ "num_return_sequences": 1,
141
+ "offset_alibi": 100,
142
+ "output_attentions": false,
143
+ "output_hidden_states": false,
144
+ "output_scores": false,
145
+ "pad_token_id": 3,
146
+ "prefix": null,
147
+ "pretraining_tp": 4,
148
+ "problem_type": null,
149
+ "pruned_heads": {},
150
+ "remove_invalid_values": false,
151
+ "repetition_penalty": 1.0,
152
+ "return_dict": true,
153
+ "return_dict_in_generate": false,
154
+ "sep_token_id": null,
155
+ "seq_length": 2048,
156
+ "skip_bias_add": true,
157
+ "skip_bias_add_qkv": false,
158
+ "slow_but_exact": false,
159
+ "suppress_tokens": null,
160
+ "task_specific_params": null,
161
+ "temperature": 1.0,
162
+ "tf_legacy_loss": false,
163
+ "tie_encoder_decoder": false,
164
+ "tie_word_embeddings": true,
165
+ "tokenizer_class": null,
166
+ "top_k": 50,
167
+ "top_p": 1.0,
168
+ "torch_dtype": "float16",
169
+ "torchscript": false,
170
+ "transformers_version": "4.31.0",
171
+ "typical_p": 1.0,
172
+ "unk_token_id": 0,
173
+ "use_bfloat16": false,
174
+ "use_cache": true,
175
+ "vocab_size": 250880
176
+ },
177
+ "tie_word_embeddings": false,
178
+ "torch_dtype": "float32",
179
+ "transformers_version": null,
180
+ "use_decoder_only_language_model": true,
181
+ "vision_config": {
182
+ "_name_or_path": "",
183
+ "add_cross_attention": false,
184
+ "architectures": null,
185
+ "attention_dropout": 0.0,
186
+ "bad_words_ids": null,
187
+ "begin_suppress_tokens": null,
188
+ "bos_token_id": null,
189
+ "chunk_size_feed_forward": 0,
190
+ "cross_attention_hidden_size": null,
191
+ "decoder_start_token_id": null,
192
+ "diversity_penalty": 0.0,
193
+ "do_sample": false,
194
+ "dropout": 0.0,
195
+ "early_stopping": false,
196
+ "encoder_no_repeat_ngram_size": 0,
197
+ "eos_token_id": null,
198
+ "exponential_decay_length_penalty": null,
199
+ "finetuning_task": null,
200
+ "forced_bos_token_id": null,
201
+ "forced_eos_token_id": null,
202
+ "hidden_act": "gelu",
203
+ "hidden_size": 1408,
204
+ "id2label": {
205
+ "0": "LABEL_0",
206
+ "1": "LABEL_1"
207
+ },
208
+ "image_size": 224,
209
+ "initializer_factor": 1.0,
210
+ "initializer_range": 1e-10,
211
+ "intermediate_size": 6144,
212
+ "is_decoder": false,
213
+ "is_encoder_decoder": false,
214
+ "label2id": {
215
+ "LABEL_0": 0,
216
+ "LABEL_1": 1
217
+ },
218
+ "layer_norm_eps": 1e-06,
219
+ "length_penalty": 1.0,
220
+ "max_length": 20,
221
+ "min_length": 0,
222
+ "model_type": "blip_2_vision_model",
223
+ "no_repeat_ngram_size": 0,
224
+ "num_attention_heads": 16,
225
+ "num_beam_groups": 1,
226
+ "num_beams": 1,
227
+ "num_channels": 3,
228
+ "num_hidden_layers": 39,
229
+ "num_return_sequences": 1,
230
+ "output_attentions": false,
231
+ "output_hidden_states": false,
232
+ "output_scores": false,
233
+ "pad_token_id": null,
234
+ "patch_size": 14,
235
+ "prefix": null,
236
+ "problem_type": null,
237
+ "projection_dim": 512,
238
+ "pruned_heads": {},
239
+ "qkv_bias": true,
240
+ "remove_invalid_values": false,
241
+ "repetition_penalty": 1.0,
242
+ "return_dict": true,
243
+ "return_dict_in_generate": false,
244
+ "sep_token_id": null,
245
+ "suppress_tokens": null,
246
+ "task_specific_params": null,
247
+ "temperature": 1.0,
248
+ "tf_legacy_loss": false,
249
+ "tie_encoder_decoder": false,
250
+ "tie_word_embeddings": true,
251
+ "tokenizer_class": null,
252
+ "top_k": 50,
253
+ "top_p": 1.0,
254
+ "torch_dtype": null,
255
+ "torchscript": false,
256
+ "transformers_version": "4.31.0",
257
+ "typical_p": 1.0,
258
+ "use_bfloat16": false
259
+ }
260
+ }
pytorch_model-00001-of-00002.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f5c563a6a96703e09ad0d462d9f60c6ae398161211050e29f461a4f6fce8f6fd
3
+ size 9923195383
pytorch_model-00002-of-00002.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ee92b4560a139c74b4551406196e1588b249e25fde47654a9ccccb8ea9345516
3
+ size 8592287959
pytorch_model.bin.index.json ADDED
The diff for this file is too large to render. See raw diff
 
special_tokens_map.json ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": "<s>",
3
+ "eos_token": "</s>",
4
+ "pad_token": "<pad>",
5
+ "unk_token": "<unk>"
6
+ }
tokenizer.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:17a208233d2ee8d8c83b23bc214df737c44806a1919f444e89b31e586cd956ba
3
+ size 14500471
tokenizer_config.json ADDED
@@ -0,0 +1,11 @@
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_prefix_space": false,
3
+ "bos_token": "<s>",
4
+ "clean_up_tokenization_spaces": false,
5
+ "eos_token": "</s>",
6
+ "model_max_length": 1000000000000000019884624838656,
7
+ "pad_token": "<pad>",
8
+ "padding_side": "left",
9
+ "tokenizer_class": "BloomTokenizer",
10
+ "unk_token": "<unk>"
11
+ }