JingyaHuang commited on
Commit
b05d834
1 Parent(s): 57c38d0

update model

Browse files
config.json ADDED
@@ -0,0 +1,25 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "temp/dummy/bert/BertModel",
3
+ "architectures": [
4
+ "BertModel"
5
+ ],
6
+ "attention_probs_dropout_prob": 0.1,
7
+ "classifier_dropout": null,
8
+ "hidden_act": "gelu",
9
+ "hidden_dropout_prob": 0.1,
10
+ "hidden_size": 32,
11
+ "initializer_range": 0.02,
12
+ "intermediate_size": 37,
13
+ "layer_norm_eps": 1e-12,
14
+ "max_position_embeddings": 512,
15
+ "model_type": "bert",
16
+ "num_attention_heads": 4,
17
+ "num_hidden_layers": 5,
18
+ "pad_token_id": 0,
19
+ "position_embedding_type": "absolute",
20
+ "torch_dtype": "float32",
21
+ "transformers_version": "4.25.0.dev0",
22
+ "type_vocab_size": 16,
23
+ "use_cache": true,
24
+ "vocab_size": 1124
25
+ }
create_model.py ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ # from transformers import AutoConfig
2
+
3
+ # from modeling.modeling_bert import BertCustomLMHeadModel
4
+
5
+ # cfg = AutoConfig.from_pretrained("hf-internal-testing/tiny-random-BertModel")
6
+
7
+ # BertCustomLMHeadModel.register_for_auto_class("AutoModelForSequenceClassification")
8
+
9
+ # model = BertCustomLMHeadModel(cfg)
10
+ # model.save_pretrained("/home/Jingya/hf_internship/tiny-testing-gpt2-remote-code")
modeling_bert.py ADDED
@@ -0,0 +1,1894 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # coding=utf-8
2
+ # Copyright 2018 The Google AI Language Team Authors and The HuggingFace Inc. team.
3
+ # Copyright (c) 2018, NVIDIA CORPORATION. All rights reserved.
4
+ #
5
+ # Licensed under the Apache License, Version 2.0 (the "License");
6
+ # you may not use this file except in compliance with the License.
7
+ # You may obtain a copy of the License at
8
+ #
9
+ # http://www.apache.org/licenses/LICENSE-2.0
10
+ #
11
+ # Unless required by applicable law or agreed to in writing, software
12
+ # distributed under the License is distributed on an "AS IS" BASIS,
13
+ # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
14
+ # See the License for the specific language governing permissions and
15
+ # limitations under the License.
16
+ """PyTorch BERT model."""
17
+
18
+
19
+ import math
20
+ import os
21
+ import warnings
22
+ from dataclasses import dataclass
23
+ from typing import List, Optional, Tuple, Union
24
+
25
+ import torch
26
+ import torch.utils.checkpoint
27
+ from torch import nn
28
+ from torch.nn import BCEWithLogitsLoss, CrossEntropyLoss, MSELoss
29
+
30
+ from ...activations import ACT2FN
31
+ from ...modeling_outputs import (
32
+ BaseModelOutputWithPastAndCrossAttentions,
33
+ BaseModelOutputWithPoolingAndCrossAttentions,
34
+ CausalLMOutputWithCrossAttentions,
35
+ MaskedLMOutput,
36
+ MultipleChoiceModelOutput,
37
+ NextSentencePredictorOutput,
38
+ QuestionAnsweringModelOutput,
39
+ SequenceClassifierOutput,
40
+ TokenClassifierOutput,
41
+ )
42
+ from ...modeling_utils import PreTrainedModel
43
+ from ...pytorch_utils import apply_chunking_to_forward, find_pruneable_heads_and_indices, prune_linear_layer
44
+ from ...utils import (
45
+ ModelOutput,
46
+ add_code_sample_docstrings,
47
+ add_start_docstrings,
48
+ add_start_docstrings_to_model_forward,
49
+ logging,
50
+ replace_return_docstrings,
51
+ )
52
+ from .configuration_bert import BertConfig
53
+
54
+
55
+ logger = logging.get_logger(__name__)
56
+
57
+ _CHECKPOINT_FOR_DOC = "bert-base-uncased"
58
+ _CONFIG_FOR_DOC = "BertConfig"
59
+
60
+ # TokenClassification docstring
61
+ _CHECKPOINT_FOR_TOKEN_CLASSIFICATION = "dbmdz/bert-large-cased-finetuned-conll03-english"
62
+ _TOKEN_CLASS_EXPECTED_OUTPUT = (
63
+ "['O', 'I-ORG', 'I-ORG', 'I-ORG', 'O', 'O', 'O', 'O', 'O', 'I-LOC', 'O', 'I-LOC', 'I-LOC'] "
64
+ )
65
+ _TOKEN_CLASS_EXPECTED_LOSS = 0.01
66
+
67
+ # QuestionAnswering docstring
68
+ _CHECKPOINT_FOR_QA = "deepset/bert-base-cased-squad2"
69
+ _QA_EXPECTED_OUTPUT = "'a nice puppet'"
70
+ _QA_EXPECTED_LOSS = 7.41
71
+ _QA_TARGET_START_INDEX = 14
72
+ _QA_TARGET_END_INDEX = 15
73
+
74
+ # SequenceClassification docstring
75
+ _CHECKPOINT_FOR_SEQUENCE_CLASSIFICATION = "textattack/bert-base-uncased-yelp-polarity"
76
+ _SEQ_CLASS_EXPECTED_OUTPUT = "'LABEL_1'"
77
+ _SEQ_CLASS_EXPECTED_LOSS = 0.01
78
+
79
+
80
+ BERT_PRETRAINED_MODEL_ARCHIVE_LIST = [
81
+ "bert-base-uncased",
82
+ "bert-large-uncased",
83
+ "bert-base-cased",
84
+ "bert-large-cased",
85
+ "bert-base-multilingual-uncased",
86
+ "bert-base-multilingual-cased",
87
+ "bert-base-chinese",
88
+ "bert-base-german-cased",
89
+ "bert-large-uncased-whole-word-masking",
90
+ "bert-large-cased-whole-word-masking",
91
+ "bert-large-uncased-whole-word-masking-finetuned-squad",
92
+ "bert-large-cased-whole-word-masking-finetuned-squad",
93
+ "bert-base-cased-finetuned-mrpc",
94
+ "bert-base-german-dbmdz-cased",
95
+ "bert-base-german-dbmdz-uncased",
96
+ "cl-tohoku/bert-base-japanese",
97
+ "cl-tohoku/bert-base-japanese-whole-word-masking",
98
+ "cl-tohoku/bert-base-japanese-char",
99
+ "cl-tohoku/bert-base-japanese-char-whole-word-masking",
100
+ "TurkuNLP/bert-base-finnish-cased-v1",
101
+ "TurkuNLP/bert-base-finnish-uncased-v1",
102
+ "wietsedv/bert-base-dutch-cased",
103
+ # See all BERT models at https://huggingface.co/models?filter=bert
104
+ ]
105
+
106
+
107
+ def load_tf_weights_in_bert(model, config, tf_checkpoint_path):
108
+ """Load tf checkpoints in a pytorch model."""
109
+ try:
110
+ import re
111
+
112
+ import numpy as np
113
+ import tensorflow as tf
114
+ except ImportError:
115
+ logger.error(
116
+ "Loading a TensorFlow model in PyTorch, requires TensorFlow to be installed. Please see "
117
+ "https://www.tensorflow.org/install/ for installation instructions."
118
+ )
119
+ raise
120
+ tf_path = os.path.abspath(tf_checkpoint_path)
121
+ logger.info(f"Converting TensorFlow checkpoint from {tf_path}")
122
+ # Load weights from TF model
123
+ init_vars = tf.train.list_variables(tf_path)
124
+ names = []
125
+ arrays = []
126
+ for name, shape in init_vars:
127
+ logger.info(f"Loading TF weight {name} with shape {shape}")
128
+ array = tf.train.load_variable(tf_path, name)
129
+ names.append(name)
130
+ arrays.append(array)
131
+
132
+ for name, array in zip(names, arrays):
133
+ name = name.split("/")
134
+ # adam_v and adam_m are variables used in AdamWeightDecayOptimizer to calculated m and v
135
+ # which are not required for using pretrained model
136
+ if any(
137
+ n in ["adam_v", "adam_m", "AdamWeightDecayOptimizer", "AdamWeightDecayOptimizer_1", "global_step"]
138
+ for n in name
139
+ ):
140
+ logger.info(f"Skipping {'/'.join(name)}")
141
+ continue
142
+ pointer = model
143
+ for m_name in name:
144
+ if re.fullmatch(r"[A-Za-z]+_\d+", m_name):
145
+ scope_names = re.split(r"_(\d+)", m_name)
146
+ else:
147
+ scope_names = [m_name]
148
+ if scope_names[0] == "kernel" or scope_names[0] == "gamma":
149
+ pointer = getattr(pointer, "weight")
150
+ elif scope_names[0] == "output_bias" or scope_names[0] == "beta":
151
+ pointer = getattr(pointer, "bias")
152
+ elif scope_names[0] == "output_weights":
153
+ pointer = getattr(pointer, "weight")
154
+ elif scope_names[0] == "squad":
155
+ pointer = getattr(pointer, "classifier")
156
+ else:
157
+ try:
158
+ pointer = getattr(pointer, scope_names[0])
159
+ except AttributeError:
160
+ logger.info(f"Skipping {'/'.join(name)}")
161
+ continue
162
+ if len(scope_names) >= 2:
163
+ num = int(scope_names[1])
164
+ pointer = pointer[num]
165
+ if m_name[-11:] == "_embeddings":
166
+ pointer = getattr(pointer, "weight")
167
+ elif m_name == "kernel":
168
+ array = np.transpose(array)
169
+ try:
170
+ if pointer.shape != array.shape:
171
+ raise ValueError(f"Pointer shape {pointer.shape} and array shape {array.shape} mismatched")
172
+ except AssertionError as e:
173
+ e.args += (pointer.shape, array.shape)
174
+ raise
175
+ logger.info(f"Initialize PyTorch weight {name}")
176
+ pointer.data = torch.from_numpy(array)
177
+ return model
178
+
179
+
180
+ class BertEmbeddings(nn.Module):
181
+ """Construct the embeddings from word, position and token_type embeddings."""
182
+
183
+ def __init__(self, config):
184
+ super().__init__()
185
+ self.word_embeddings = nn.Embedding(config.vocab_size, config.hidden_size, padding_idx=config.pad_token_id)
186
+ self.position_embeddings = nn.Embedding(config.max_position_embeddings, config.hidden_size)
187
+ self.token_type_embeddings = nn.Embedding(config.type_vocab_size, config.hidden_size)
188
+
189
+ # self.LayerNorm is not snake-cased to stick with TensorFlow model variable name and be able to load
190
+ # any TensorFlow checkpoint file
191
+ self.LayerNorm = nn.LayerNorm(config.hidden_size, eps=config.layer_norm_eps)
192
+ self.dropout = nn.Dropout(config.hidden_dropout_prob)
193
+ # position_ids (1, len position emb) is contiguous in memory and exported when serialized
194
+ self.position_embedding_type = getattr(config, "position_embedding_type", "absolute")
195
+ self.register_buffer("position_ids", torch.arange(config.max_position_embeddings).expand((1, -1)))
196
+ self.register_buffer(
197
+ "token_type_ids", torch.zeros(self.position_ids.size(), dtype=torch.long), persistent=False
198
+ )
199
+
200
+ def forward(
201
+ self,
202
+ input_ids: Optional[torch.LongTensor] = None,
203
+ token_type_ids: Optional[torch.LongTensor] = None,
204
+ position_ids: Optional[torch.LongTensor] = None,
205
+ inputs_embeds: Optional[torch.FloatTensor] = None,
206
+ past_key_values_length: int = 0,
207
+ ) -> torch.Tensor:
208
+ if input_ids is not None:
209
+ input_shape = input_ids.size()
210
+ else:
211
+ input_shape = inputs_embeds.size()[:-1]
212
+
213
+ seq_length = input_shape[1]
214
+
215
+ if position_ids is None:
216
+ position_ids = self.position_ids[:, past_key_values_length : seq_length + past_key_values_length]
217
+
218
+ # Setting the token_type_ids to the registered buffer in constructor where it is all zeros, which usually occurs
219
+ # when its auto-generated, registered buffer helps users when tracing the model without passing token_type_ids, solves
220
+ # issue #5664
221
+ if token_type_ids is None:
222
+ if hasattr(self, "token_type_ids"):
223
+ buffered_token_type_ids = self.token_type_ids[:, :seq_length]
224
+ buffered_token_type_ids_expanded = buffered_token_type_ids.expand(input_shape[0], seq_length)
225
+ token_type_ids = buffered_token_type_ids_expanded
226
+ else:
227
+ token_type_ids = torch.zeros(input_shape, dtype=torch.long, device=self.position_ids.device)
228
+
229
+ if inputs_embeds is None:
230
+ inputs_embeds = self.word_embeddings(input_ids)
231
+ token_type_embeddings = self.token_type_embeddings(token_type_ids)
232
+
233
+ embeddings = inputs_embeds + token_type_embeddings
234
+ if self.position_embedding_type == "absolute":
235
+ position_embeddings = self.position_embeddings(position_ids)
236
+ embeddings += position_embeddings
237
+ embeddings = self.LayerNorm(embeddings)
238
+ embeddings = self.dropout(embeddings)
239
+ return embeddings
240
+
241
+
242
+ class BertSelfAttention(nn.Module):
243
+ def __init__(self, config, position_embedding_type=None):
244
+ super().__init__()
245
+ if config.hidden_size % config.num_attention_heads != 0 and not hasattr(config, "embedding_size"):
246
+ raise ValueError(
247
+ f"The hidden size ({config.hidden_size}) is not a multiple of the number of attention "
248
+ f"heads ({config.num_attention_heads})"
249
+ )
250
+
251
+ self.num_attention_heads = config.num_attention_heads
252
+ self.attention_head_size = int(config.hidden_size / config.num_attention_heads)
253
+ self.all_head_size = self.num_attention_heads * self.attention_head_size
254
+
255
+ self.query = nn.Linear(config.hidden_size, self.all_head_size)
256
+ self.key = nn.Linear(config.hidden_size, self.all_head_size)
257
+ self.value = nn.Linear(config.hidden_size, self.all_head_size)
258
+
259
+ self.dropout = nn.Dropout(config.attention_probs_dropout_prob)
260
+ self.position_embedding_type = position_embedding_type or getattr(
261
+ config, "position_embedding_type", "absolute"
262
+ )
263
+ if self.position_embedding_type == "relative_key" or self.position_embedding_type == "relative_key_query":
264
+ self.max_position_embeddings = config.max_position_embeddings
265
+ self.distance_embedding = nn.Embedding(2 * config.max_position_embeddings - 1, self.attention_head_size)
266
+
267
+ self.is_decoder = config.is_decoder
268
+
269
+ def transpose_for_scores(self, x: torch.Tensor) -> torch.Tensor:
270
+ new_x_shape = x.size()[:-1] + (self.num_attention_heads, self.attention_head_size)
271
+ x = x.view(new_x_shape)
272
+ return x.permute(0, 2, 1, 3)
273
+
274
+ def forward(
275
+ self,
276
+ hidden_states: torch.Tensor,
277
+ attention_mask: Optional[torch.FloatTensor] = None,
278
+ head_mask: Optional[torch.FloatTensor] = None,
279
+ encoder_hidden_states: Optional[torch.FloatTensor] = None,
280
+ encoder_attention_mask: Optional[torch.FloatTensor] = None,
281
+ past_key_value: Optional[Tuple[Tuple[torch.FloatTensor]]] = None,
282
+ output_attentions: Optional[bool] = False,
283
+ ) -> Tuple[torch.Tensor]:
284
+ mixed_query_layer = self.query(hidden_states)
285
+
286
+ # If this is instantiated as a cross-attention module, the keys
287
+ # and values come from an encoder; the attention mask needs to be
288
+ # such that the encoder's padding tokens are not attended to.
289
+ is_cross_attention = encoder_hidden_states is not None
290
+
291
+ if is_cross_attention and past_key_value is not None:
292
+ # reuse k,v, cross_attentions
293
+ key_layer = past_key_value[0]
294
+ value_layer = past_key_value[1]
295
+ attention_mask = encoder_attention_mask
296
+ elif is_cross_attention:
297
+ key_layer = self.transpose_for_scores(self.key(encoder_hidden_states))
298
+ value_layer = self.transpose_for_scores(self.value(encoder_hidden_states))
299
+ attention_mask = encoder_attention_mask
300
+ elif past_key_value is not None:
301
+ key_layer = self.transpose_for_scores(self.key(hidden_states))
302
+ value_layer = self.transpose_for_scores(self.value(hidden_states))
303
+ key_layer = torch.cat([past_key_value[0], key_layer], dim=2)
304
+ value_layer = torch.cat([past_key_value[1], value_layer], dim=2)
305
+ else:
306
+ key_layer = self.transpose_for_scores(self.key(hidden_states))
307
+ value_layer = self.transpose_for_scores(self.value(hidden_states))
308
+
309
+ query_layer = self.transpose_for_scores(mixed_query_layer)
310
+
311
+ use_cache = past_key_value is not None
312
+ if self.is_decoder:
313
+ # if cross_attention save Tuple(torch.Tensor, torch.Tensor) of all cross attention key/value_states.
314
+ # Further calls to cross_attention layer can then reuse all cross-attention
315
+ # key/value_states (first "if" case)
316
+ # if uni-directional self-attention (decoder) save Tuple(torch.Tensor, torch.Tensor) of
317
+ # all previous decoder key/value_states. Further calls to uni-directional self-attention
318
+ # can concat previous decoder key/value_states to current projected key/value_states (third "elif" case)
319
+ # if encoder bi-directional self-attention `past_key_value` is always `None`
320
+ past_key_value = (key_layer, value_layer)
321
+
322
+ # Take the dot product between "query" and "key" to get the raw attention scores.
323
+ attention_scores = torch.matmul(query_layer, key_layer.transpose(-1, -2))
324
+
325
+ if self.position_embedding_type == "relative_key" or self.position_embedding_type == "relative_key_query":
326
+ query_length, key_length = query_layer.shape[2], key_layer.shape[2]
327
+ if use_cache:
328
+ position_ids_l = torch.tensor(key_length - 1, dtype=torch.long, device=hidden_states.device).view(
329
+ -1, 1
330
+ )
331
+ else:
332
+ position_ids_l = torch.arange(query_length, dtype=torch.long, device=hidden_states.device).view(-1, 1)
333
+ position_ids_r = torch.arange(key_length, dtype=torch.long, device=hidden_states.device).view(1, -1)
334
+ distance = position_ids_l - position_ids_r
335
+
336
+ positional_embedding = self.distance_embedding(distance + self.max_position_embeddings - 1)
337
+ positional_embedding = positional_embedding.to(dtype=query_layer.dtype) # fp16 compatibility
338
+
339
+ if self.position_embedding_type == "relative_key":
340
+ relative_position_scores = torch.einsum("bhld,lrd->bhlr", query_layer, positional_embedding)
341
+ attention_scores = attention_scores + relative_position_scores
342
+ elif self.position_embedding_type == "relative_key_query":
343
+ relative_position_scores_query = torch.einsum("bhld,lrd->bhlr", query_layer, positional_embedding)
344
+ relative_position_scores_key = torch.einsum("bhrd,lrd->bhlr", key_layer, positional_embedding)
345
+ attention_scores = attention_scores + relative_position_scores_query + relative_position_scores_key
346
+
347
+ attention_scores = attention_scores / math.sqrt(self.attention_head_size)
348
+ if attention_mask is not None:
349
+ # Apply the attention mask is (precomputed for all layers in BertModel forward() function)
350
+ attention_scores = attention_scores + attention_mask
351
+
352
+ # Normalize the attention scores to probabilities.
353
+ attention_probs = nn.functional.softmax(attention_scores, dim=-1)
354
+
355
+ # This is actually dropping out entire tokens to attend to, which might
356
+ # seem a bit unusual, but is taken from the original Transformer paper.
357
+ attention_probs = self.dropout(attention_probs)
358
+
359
+ # Mask heads if we want to
360
+ if head_mask is not None:
361
+ attention_probs = attention_probs * head_mask
362
+
363
+ context_layer = torch.matmul(attention_probs, value_layer)
364
+
365
+ context_layer = context_layer.permute(0, 2, 1, 3).contiguous()
366
+ new_context_layer_shape = context_layer.size()[:-2] + (self.all_head_size,)
367
+ context_layer = context_layer.view(new_context_layer_shape)
368
+
369
+ outputs = (context_layer, attention_probs) if output_attentions else (context_layer,)
370
+
371
+ if self.is_decoder:
372
+ outputs = outputs + (past_key_value,)
373
+ return outputs
374
+
375
+
376
+ class BertSelfOutput(nn.Module):
377
+ def __init__(self, config):
378
+ super().__init__()
379
+ self.dense = nn.Linear(config.hidden_size, config.hidden_size)
380
+ self.LayerNorm = nn.LayerNorm(config.hidden_size, eps=config.layer_norm_eps)
381
+ self.dropout = nn.Dropout(config.hidden_dropout_prob)
382
+
383
+ def forward(self, hidden_states: torch.Tensor, input_tensor: torch.Tensor) -> torch.Tensor:
384
+ hidden_states = self.dense(hidden_states)
385
+ hidden_states = self.dropout(hidden_states)
386
+ hidden_states = self.LayerNorm(hidden_states + input_tensor)
387
+ return hidden_states
388
+
389
+
390
+ class BertAttention(nn.Module):
391
+ def __init__(self, config, position_embedding_type=None):
392
+ super().__init__()
393
+ self.self = BertSelfAttention(config, position_embedding_type=position_embedding_type)
394
+ self.output = BertSelfOutput(config)
395
+ self.pruned_heads = set()
396
+
397
+ def prune_heads(self, heads):
398
+ if len(heads) == 0:
399
+ return
400
+ heads, index = find_pruneable_heads_and_indices(
401
+ heads, self.self.num_attention_heads, self.self.attention_head_size, self.pruned_heads
402
+ )
403
+
404
+ # Prune linear layers
405
+ self.self.query = prune_linear_layer(self.self.query, index)
406
+ self.self.key = prune_linear_layer(self.self.key, index)
407
+ self.self.value = prune_linear_layer(self.self.value, index)
408
+ self.output.dense = prune_linear_layer(self.output.dense, index, dim=1)
409
+
410
+ # Update hyper params and store pruned heads
411
+ self.self.num_attention_heads = self.self.num_attention_heads - len(heads)
412
+ self.self.all_head_size = self.self.attention_head_size * self.self.num_attention_heads
413
+ self.pruned_heads = self.pruned_heads.union(heads)
414
+
415
+ def forward(
416
+ self,
417
+ hidden_states: torch.Tensor,
418
+ attention_mask: Optional[torch.FloatTensor] = None,
419
+ head_mask: Optional[torch.FloatTensor] = None,
420
+ encoder_hidden_states: Optional[torch.FloatTensor] = None,
421
+ encoder_attention_mask: Optional[torch.FloatTensor] = None,
422
+ past_key_value: Optional[Tuple[Tuple[torch.FloatTensor]]] = None,
423
+ output_attentions: Optional[bool] = False,
424
+ ) -> Tuple[torch.Tensor]:
425
+ self_outputs = self.self(
426
+ hidden_states,
427
+ attention_mask,
428
+ head_mask,
429
+ encoder_hidden_states,
430
+ encoder_attention_mask,
431
+ past_key_value,
432
+ output_attentions,
433
+ )
434
+ attention_output = self.output(self_outputs[0], hidden_states)
435
+ outputs = (attention_output,) + self_outputs[1:] # add attentions if we output them
436
+ return outputs
437
+
438
+
439
+ class BertIntermediate(nn.Module):
440
+ def __init__(self, config):
441
+ super().__init__()
442
+ self.dense = nn.Linear(config.hidden_size, config.intermediate_size)
443
+ if isinstance(config.hidden_act, str):
444
+ self.intermediate_act_fn = ACT2FN[config.hidden_act]
445
+ else:
446
+ self.intermediate_act_fn = config.hidden_act
447
+
448
+ def forward(self, hidden_states: torch.Tensor) -> torch.Tensor:
449
+ hidden_states = self.dense(hidden_states)
450
+ hidden_states = self.intermediate_act_fn(hidden_states)
451
+ return hidden_states
452
+
453
+
454
+ class BertOutput(nn.Module):
455
+ def __init__(self, config):
456
+ super().__init__()
457
+ self.dense = nn.Linear(config.intermediate_size, config.hidden_size)
458
+ self.LayerNorm = nn.LayerNorm(config.hidden_size, eps=config.layer_norm_eps)
459
+ self.dropout = nn.Dropout(config.hidden_dropout_prob)
460
+
461
+ def forward(self, hidden_states: torch.Tensor, input_tensor: torch.Tensor) -> torch.Tensor:
462
+ hidden_states = self.dense(hidden_states)
463
+ hidden_states = self.dropout(hidden_states)
464
+ hidden_states = self.LayerNorm(hidden_states + input_tensor)
465
+ return hidden_states
466
+
467
+
468
+ class BertLayer(nn.Module):
469
+ def __init__(self, config):
470
+ super().__init__()
471
+ self.chunk_size_feed_forward = config.chunk_size_feed_forward
472
+ self.seq_len_dim = 1
473
+ self.attention = BertAttention(config)
474
+ self.is_decoder = config.is_decoder
475
+ self.add_cross_attention = config.add_cross_attention
476
+ if self.add_cross_attention:
477
+ if not self.is_decoder:
478
+ raise ValueError(f"{self} should be used as a decoder model if cross attention is added")
479
+ self.crossattention = BertAttention(config, position_embedding_type="absolute")
480
+ self.intermediate = BertIntermediate(config)
481
+ self.output = BertOutput(config)
482
+
483
+ def forward(
484
+ self,
485
+ hidden_states: torch.Tensor,
486
+ attention_mask: Optional[torch.FloatTensor] = None,
487
+ head_mask: Optional[torch.FloatTensor] = None,
488
+ encoder_hidden_states: Optional[torch.FloatTensor] = None,
489
+ encoder_attention_mask: Optional[torch.FloatTensor] = None,
490
+ past_key_value: Optional[Tuple[Tuple[torch.FloatTensor]]] = None,
491
+ output_attentions: Optional[bool] = False,
492
+ ) -> Tuple[torch.Tensor]:
493
+ # decoder uni-directional self-attention cached key/values tuple is at positions 1,2
494
+ self_attn_past_key_value = past_key_value[:2] if past_key_value is not None else None
495
+ self_attention_outputs = self.attention(
496
+ hidden_states,
497
+ attention_mask,
498
+ head_mask,
499
+ output_attentions=output_attentions,
500
+ past_key_value=self_attn_past_key_value,
501
+ )
502
+ attention_output = self_attention_outputs[0]
503
+
504
+ # if decoder, the last output is tuple of self-attn cache
505
+ if self.is_decoder:
506
+ outputs = self_attention_outputs[1:-1]
507
+ present_key_value = self_attention_outputs[-1]
508
+ else:
509
+ outputs = self_attention_outputs[1:] # add self attentions if we output attention weights
510
+
511
+ cross_attn_present_key_value = None
512
+ if self.is_decoder and encoder_hidden_states is not None:
513
+ if not hasattr(self, "crossattention"):
514
+ raise ValueError(
515
+ f"If `encoder_hidden_states` are passed, {self} has to be instantiated with cross-attention layers"
516
+ " by setting `config.add_cross_attention=True`"
517
+ )
518
+
519
+ # cross_attn cached key/values tuple is at positions 3,4 of past_key_value tuple
520
+ cross_attn_past_key_value = past_key_value[-2:] if past_key_value is not None else None
521
+ cross_attention_outputs = self.crossattention(
522
+ attention_output,
523
+ attention_mask,
524
+ head_mask,
525
+ encoder_hidden_states,
526
+ encoder_attention_mask,
527
+ cross_attn_past_key_value,
528
+ output_attentions,
529
+ )
530
+ attention_output = cross_attention_outputs[0]
531
+ outputs = outputs + cross_attention_outputs[1:-1] # add cross attentions if we output attention weights
532
+
533
+ # add cross-attn cache to positions 3,4 of present_key_value tuple
534
+ cross_attn_present_key_value = cross_attention_outputs[-1]
535
+ present_key_value = present_key_value + cross_attn_present_key_value
536
+
537
+ layer_output = apply_chunking_to_forward(
538
+ self.feed_forward_chunk, self.chunk_size_feed_forward, self.seq_len_dim, attention_output
539
+ )
540
+ outputs = (layer_output,) + outputs
541
+
542
+ # if decoder, return the attn key/values as the last output
543
+ if self.is_decoder:
544
+ outputs = outputs + (present_key_value,)
545
+
546
+ return outputs
547
+
548
+ def feed_forward_chunk(self, attention_output):
549
+ intermediate_output = self.intermediate(attention_output)
550
+ layer_output = self.output(intermediate_output, attention_output)
551
+ return layer_output
552
+
553
+
554
+ class BertEncoder(nn.Module):
555
+ def __init__(self, config):
556
+ super().__init__()
557
+ self.config = config
558
+ self.layer = nn.ModuleList([BertLayer(config) for _ in range(config.num_hidden_layers)])
559
+ self.gradient_checkpointing = False
560
+
561
+ def forward(
562
+ self,
563
+ hidden_states: torch.Tensor,
564
+ attention_mask: Optional[torch.FloatTensor] = None,
565
+ head_mask: Optional[torch.FloatTensor] = None,
566
+ encoder_hidden_states: Optional[torch.FloatTensor] = None,
567
+ encoder_attention_mask: Optional[torch.FloatTensor] = None,
568
+ past_key_values: Optional[Tuple[Tuple[torch.FloatTensor]]] = None,
569
+ use_cache: Optional[bool] = None,
570
+ output_attentions: Optional[bool] = False,
571
+ output_hidden_states: Optional[bool] = False,
572
+ return_dict: Optional[bool] = True,
573
+ ) -> Union[Tuple[torch.Tensor], BaseModelOutputWithPastAndCrossAttentions]:
574
+ all_hidden_states = () if output_hidden_states else None
575
+ all_self_attentions = () if output_attentions else None
576
+ all_cross_attentions = () if output_attentions and self.config.add_cross_attention else None
577
+
578
+ if self.gradient_checkpointing and self.training:
579
+ if use_cache:
580
+ logger.warning_once(
581
+ "`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`..."
582
+ )
583
+ use_cache = False
584
+
585
+ next_decoder_cache = () if use_cache else None
586
+ for i, layer_module in enumerate(self.layer):
587
+ if output_hidden_states:
588
+ all_hidden_states = all_hidden_states + (hidden_states,)
589
+
590
+ layer_head_mask = head_mask[i] if head_mask is not None else None
591
+ past_key_value = past_key_values[i] if past_key_values is not None else None
592
+
593
+ if self.gradient_checkpointing and self.training:
594
+
595
+ def create_custom_forward(module):
596
+ def custom_forward(*inputs):
597
+ return module(*inputs, past_key_value, output_attentions)
598
+
599
+ return custom_forward
600
+
601
+ layer_outputs = torch.utils.checkpoint.checkpoint(
602
+ create_custom_forward(layer_module),
603
+ hidden_states,
604
+ attention_mask,
605
+ layer_head_mask,
606
+ encoder_hidden_states,
607
+ encoder_attention_mask,
608
+ )
609
+ else:
610
+ layer_outputs = layer_module(
611
+ hidden_states,
612
+ attention_mask,
613
+ layer_head_mask,
614
+ encoder_hidden_states,
615
+ encoder_attention_mask,
616
+ past_key_value,
617
+ output_attentions,
618
+ )
619
+
620
+ hidden_states = layer_outputs[0]
621
+ if use_cache:
622
+ next_decoder_cache += (layer_outputs[-1],)
623
+ if output_attentions:
624
+ all_self_attentions = all_self_attentions + (layer_outputs[1],)
625
+ if self.config.add_cross_attention:
626
+ all_cross_attentions = all_cross_attentions + (layer_outputs[2],)
627
+
628
+ if output_hidden_states:
629
+ all_hidden_states = all_hidden_states + (hidden_states,)
630
+
631
+ if not return_dict:
632
+ return tuple(
633
+ v
634
+ for v in [
635
+ hidden_states,
636
+ next_decoder_cache,
637
+ all_hidden_states,
638
+ all_self_attentions,
639
+ all_cross_attentions,
640
+ ]
641
+ if v is not None
642
+ )
643
+ return BaseModelOutputWithPastAndCrossAttentions(
644
+ last_hidden_state=hidden_states,
645
+ past_key_values=next_decoder_cache,
646
+ hidden_states=all_hidden_states,
647
+ attentions=all_self_attentions,
648
+ cross_attentions=all_cross_attentions,
649
+ )
650
+
651
+
652
+ class BertPooler(nn.Module):
653
+ def __init__(self, config):
654
+ super().__init__()
655
+ self.dense = nn.Linear(config.hidden_size, config.hidden_size)
656
+ self.activation = nn.Tanh()
657
+
658
+ def forward(self, hidden_states: torch.Tensor) -> torch.Tensor:
659
+ # We "pool" the model by simply taking the hidden state corresponding
660
+ # to the first token.
661
+ first_token_tensor = hidden_states[:, 0]
662
+ pooled_output = self.dense(first_token_tensor)
663
+ pooled_output = self.activation(pooled_output)
664
+ return pooled_output
665
+
666
+
667
+ class BertPredictionHeadTransform(nn.Module):
668
+ def __init__(self, config):
669
+ super().__init__()
670
+ self.dense = nn.Linear(config.hidden_size, config.hidden_size)
671
+ if isinstance(config.hidden_act, str):
672
+ self.transform_act_fn = ACT2FN[config.hidden_act]
673
+ else:
674
+ self.transform_act_fn = config.hidden_act
675
+ self.LayerNorm = nn.LayerNorm(config.hidden_size, eps=config.layer_norm_eps)
676
+
677
+ def forward(self, hidden_states: torch.Tensor) -> torch.Tensor:
678
+ hidden_states = self.dense(hidden_states)
679
+ hidden_states = self.transform_act_fn(hidden_states)
680
+ hidden_states = self.LayerNorm(hidden_states)
681
+ return hidden_states
682
+
683
+
684
+ class BertLMPredictionHead(nn.Module):
685
+ def __init__(self, config):
686
+ super().__init__()
687
+ self.transform = BertPredictionHeadTransform(config)
688
+
689
+ # The output weights are the same as the input embeddings, but there is
690
+ # an output-only bias for each token.
691
+ self.decoder = nn.Linear(config.hidden_size, config.vocab_size, bias=False)
692
+
693
+ self.bias = nn.Parameter(torch.zeros(config.vocab_size))
694
+
695
+ # Need a link between the two variables so that the bias is correctly resized with `resize_token_embeddings`
696
+ self.decoder.bias = self.bias
697
+
698
+ def forward(self, hidden_states):
699
+ hidden_states = self.transform(hidden_states)
700
+ hidden_states = self.decoder(hidden_states)
701
+ return hidden_states
702
+
703
+
704
+ class BertOnlyMLMHead(nn.Module):
705
+ def __init__(self, config):
706
+ super().__init__()
707
+ self.predictions = BertLMPredictionHead(config)
708
+
709
+ def forward(self, sequence_output: torch.Tensor) -> torch.Tensor:
710
+ prediction_scores = self.predictions(sequence_output)
711
+ return prediction_scores
712
+
713
+
714
+ class BertOnlyNSPHead(nn.Module):
715
+ def __init__(self, config):
716
+ super().__init__()
717
+ self.seq_relationship = nn.Linear(config.hidden_size, 2)
718
+
719
+ def forward(self, pooled_output):
720
+ seq_relationship_score = self.seq_relationship(pooled_output)
721
+ return seq_relationship_score
722
+
723
+
724
+ class BertPreTrainingHeads(nn.Module):
725
+ def __init__(self, config):
726
+ super().__init__()
727
+ self.predictions = BertLMPredictionHead(config)
728
+ self.seq_relationship = nn.Linear(config.hidden_size, 2)
729
+
730
+ def forward(self, sequence_output, pooled_output):
731
+ prediction_scores = self.predictions(sequence_output)
732
+ seq_relationship_score = self.seq_relationship(pooled_output)
733
+ return prediction_scores, seq_relationship_score
734
+
735
+
736
+ class BertPreTrainedModel(PreTrainedModel):
737
+ """
738
+ An abstract class to handle weights initialization and a simple interface for downloading and loading pretrained
739
+ models.
740
+ """
741
+
742
+ config_class = BertConfig
743
+ load_tf_weights = load_tf_weights_in_bert
744
+ base_model_prefix = "bert"
745
+ supports_gradient_checkpointing = True
746
+ _keys_to_ignore_on_load_missing = [r"position_ids"]
747
+
748
+ def _init_weights(self, module):
749
+ """Initialize the weights"""
750
+ if isinstance(module, nn.Linear):
751
+ # Slightly different from the TF version which uses truncated_normal for initialization
752
+ # cf https://github.com/pytorch/pytorch/pull/5617
753
+ module.weight.data.normal_(mean=0.0, std=self.config.initializer_range)
754
+ if module.bias is not None:
755
+ module.bias.data.zero_()
756
+ elif isinstance(module, nn.Embedding):
757
+ module.weight.data.normal_(mean=0.0, std=self.config.initializer_range)
758
+ if module.padding_idx is not None:
759
+ module.weight.data[module.padding_idx].zero_()
760
+ elif isinstance(module, nn.LayerNorm):
761
+ module.bias.data.zero_()
762
+ module.weight.data.fill_(1.0)
763
+
764
+ def _set_gradient_checkpointing(self, module, value=False):
765
+ if isinstance(module, BertEncoder):
766
+ module.gradient_checkpointing = value
767
+
768
+
769
+ @dataclass
770
+ class BertForPreTrainingOutput(ModelOutput):
771
+ """
772
+ Output type of [`BertForPreTraining`].
773
+
774
+ Args:
775
+ loss (*optional*, returned when `labels` is provided, `torch.FloatTensor` of shape `(1,)`):
776
+ Total loss as the sum of the masked language modeling loss and the next sequence prediction
777
+ (classification) loss.
778
+ prediction_logits (`torch.FloatTensor` of shape `(batch_size, sequence_length, config.vocab_size)`):
779
+ Prediction scores of the language modeling head (scores for each vocabulary token before SoftMax).
780
+ seq_relationship_logits (`torch.FloatTensor` of shape `(batch_size, 2)`):
781
+ Prediction scores of the next sequence prediction (classification) head (scores of True/False continuation
782
+ before SoftMax).
783
+ hidden_states (`tuple(torch.FloatTensor)`, *optional*, returned when `output_hidden_states=True` is passed or when `config.output_hidden_states=True`):
784
+ Tuple of `torch.FloatTensor` (one for the output of the embeddings + one for the output of each layer) of
785
+ shape `(batch_size, sequence_length, hidden_size)`.
786
+
787
+ Hidden-states of the model at the output of each layer plus the initial embedding outputs.
788
+ attentions (`tuple(torch.FloatTensor)`, *optional*, returned when `output_attentions=True` is passed or when `config.output_attentions=True`):
789
+ Tuple of `torch.FloatTensor` (one for each layer) of shape `(batch_size, num_heads, sequence_length,
790
+ sequence_length)`.
791
+
792
+ Attentions weights after the attention softmax, used to compute the weighted average in the self-attention
793
+ heads.
794
+ """
795
+
796
+ loss: Optional[torch.FloatTensor] = None
797
+ prediction_logits: torch.FloatTensor = None
798
+ seq_relationship_logits: torch.FloatTensor = None
799
+ hidden_states: Optional[Tuple[torch.FloatTensor]] = None
800
+ attentions: Optional[Tuple[torch.FloatTensor]] = None
801
+
802
+
803
+ BERT_START_DOCSTRING = r"""
804
+
805
+ This model inherits from [`PreTrainedModel`]. Check the superclass documentation for the generic methods the
806
+ library implements for all its model (such as downloading or saving, resizing the input embeddings, pruning heads
807
+ etc.)
808
+
809
+ This model is also a PyTorch [torch.nn.Module](https://pytorch.org/docs/stable/nn.html#torch.nn.Module) subclass.
810
+ Use it as a regular PyTorch Module and refer to the PyTorch documentation for all matter related to general usage
811
+ and behavior.
812
+
813
+ Parameters:
814
+ config ([`BertConfig`]): Model configuration class with all the parameters of the model.
815
+ Initializing with a config file does not load the weights associated with the model, only the
816
+ configuration. Check out the [`~PreTrainedModel.from_pretrained`] method to load the model weights.
817
+ """
818
+
819
+ BERT_INPUTS_DOCSTRING = r"""
820
+ Args:
821
+ input_ids (`torch.LongTensor` of shape `({0})`):
822
+ Indices of input sequence tokens in the vocabulary.
823
+
824
+ Indices can be obtained using [`AutoTokenizer`]. See [`PreTrainedTokenizer.encode`] and
825
+ [`PreTrainedTokenizer.__call__`] for details.
826
+
827
+ [What are input IDs?](../glossary#input-ids)
828
+ attention_mask (`torch.FloatTensor` of shape `({0})`, *optional*):
829
+ Mask to avoid performing attention on padding token indices. Mask values selected in `[0, 1]`:
830
+
831
+ - 1 for tokens that are **not masked**,
832
+ - 0 for tokens that are **masked**.
833
+
834
+ [What are attention masks?](../glossary#attention-mask)
835
+ token_type_ids (`torch.LongTensor` of shape `({0})`, *optional*):
836
+ Segment token indices to indicate first and second portions of the inputs. Indices are selected in `[0,
837
+ 1]`:
838
+
839
+ - 0 corresponds to a *sentence A* token,
840
+ - 1 corresponds to a *sentence B* token.
841
+
842
+ [What are token type IDs?](../glossary#token-type-ids)
843
+ position_ids (`torch.LongTensor` of shape `({0})`, *optional*):
844
+ Indices of positions of each input sequence tokens in the position embeddings. Selected in the range `[0,
845
+ config.max_position_embeddings - 1]`.
846
+
847
+ [What are position IDs?](../glossary#position-ids)
848
+ head_mask (`torch.FloatTensor` of shape `(num_heads,)` or `(num_layers, num_heads)`, *optional*):
849
+ Mask to nullify selected heads of the self-attention modules. Mask values selected in `[0, 1]`:
850
+
851
+ - 1 indicates the head is **not masked**,
852
+ - 0 indicates the head is **masked**.
853
+
854
+ inputs_embeds (`torch.FloatTensor` of shape `({0}, hidden_size)`, *optional*):
855
+ Optionally, instead of passing `input_ids` you can choose to directly pass an embedded representation. This
856
+ is useful if you want more control over how to convert `input_ids` indices into associated vectors than the
857
+ model's internal embedding lookup matrix.
858
+ output_attentions (`bool`, *optional*):
859
+ Whether or not to return the attentions tensors of all attention layers. See `attentions` under returned
860
+ tensors for more detail.
861
+ output_hidden_states (`bool`, *optional*):
862
+ Whether or not to return the hidden states of all layers. See `hidden_states` under returned tensors for
863
+ more detail.
864
+ return_dict (`bool`, *optional*):
865
+ Whether or not to return a [`~utils.ModelOutput`] instead of a plain tuple.
866
+ """
867
+
868
+
869
+ @add_start_docstrings(
870
+ "The bare Bert Model transformer outputting raw hidden-states without any specific head on top.",
871
+ BERT_START_DOCSTRING,
872
+ )
873
+ class BertModel(BertPreTrainedModel):
874
+ """
875
+
876
+ The model can behave as an encoder (with only self-attention) as well as a decoder, in which case a layer of
877
+ cross-attention is added between the self-attention layers, following the architecture described in [Attention is
878
+ all you need](https://arxiv.org/abs/1706.03762) by Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit,
879
+ Llion Jones, Aidan N. Gomez, Lukasz Kaiser and Illia Polosukhin.
880
+
881
+ To behave as an decoder the model needs to be initialized with the `is_decoder` argument of the configuration set
882
+ to `True`. To be used in a Seq2Seq model, the model needs to initialized with both `is_decoder` argument and
883
+ `add_cross_attention` set to `True`; an `encoder_hidden_states` is then expected as an input to the forward pass.
884
+ """
885
+
886
+ def __init__(self, config, add_pooling_layer=True):
887
+ super().__init__(config)
888
+ self.config = config
889
+
890
+ self.embeddings = BertEmbeddings(config)
891
+ self.encoder = BertEncoder(config)
892
+
893
+ self.pooler = BertPooler(config) if add_pooling_layer else None
894
+
895
+ # Initialize weights and apply final processing
896
+ self.post_init()
897
+
898
+ def get_input_embeddings(self):
899
+ return self.embeddings.word_embeddings
900
+
901
+ def set_input_embeddings(self, value):
902
+ self.embeddings.word_embeddings = value
903
+
904
+ def _prune_heads(self, heads_to_prune):
905
+ """
906
+ Prunes heads of the model. heads_to_prune: dict of {layer_num: list of heads to prune in this layer} See base
907
+ class PreTrainedModel
908
+ """
909
+ for layer, heads in heads_to_prune.items():
910
+ self.encoder.layer[layer].attention.prune_heads(heads)
911
+
912
+ @add_start_docstrings_to_model_forward(BERT_INPUTS_DOCSTRING.format("batch_size, sequence_length"))
913
+ @add_code_sample_docstrings(
914
+ checkpoint=_CHECKPOINT_FOR_DOC,
915
+ output_type=BaseModelOutputWithPoolingAndCrossAttentions,
916
+ config_class=_CONFIG_FOR_DOC,
917
+ )
918
+ def forward(
919
+ self,
920
+ input_ids: Optional[torch.Tensor] = None,
921
+ attention_mask: Optional[torch.Tensor] = None,
922
+ token_type_ids: Optional[torch.Tensor] = None,
923
+ position_ids: Optional[torch.Tensor] = None,
924
+ head_mask: Optional[torch.Tensor] = None,
925
+ inputs_embeds: Optional[torch.Tensor] = None,
926
+ encoder_hidden_states: Optional[torch.Tensor] = None,
927
+ encoder_attention_mask: Optional[torch.Tensor] = None,
928
+ past_key_values: Optional[List[torch.FloatTensor]] = None,
929
+ use_cache: Optional[bool] = None,
930
+ output_attentions: Optional[bool] = None,
931
+ output_hidden_states: Optional[bool] = None,
932
+ return_dict: Optional[bool] = None,
933
+ ) -> Union[Tuple[torch.Tensor], BaseModelOutputWithPoolingAndCrossAttentions]:
934
+ r"""
935
+ encoder_hidden_states (`torch.FloatTensor` of shape `(batch_size, sequence_length, hidden_size)`, *optional*):
936
+ Sequence of hidden-states at the output of the last layer of the encoder. Used in the cross-attention if
937
+ the model is configured as a decoder.
938
+ encoder_attention_mask (`torch.FloatTensor` of shape `(batch_size, sequence_length)`, *optional*):
939
+ Mask to avoid performing attention on the padding token indices of the encoder input. This mask is used in
940
+ the cross-attention if the model is configured as a decoder. Mask values selected in `[0, 1]`:
941
+
942
+ - 1 for tokens that are **not masked**,
943
+ - 0 for tokens that are **masked**.
944
+ past_key_values (`tuple(tuple(torch.FloatTensor))` of length `config.n_layers` with each tuple having 4 tensors of shape `(batch_size, num_heads, sequence_length - 1, embed_size_per_head)`):
945
+ Contains precomputed key and value hidden states of the attention blocks. Can be used to speed up decoding.
946
+
947
+ If `past_key_values` are used, the user can optionally input only the last `decoder_input_ids` (those that
948
+ don't have their past key value states given to this model) of shape `(batch_size, 1)` instead of all
949
+ `decoder_input_ids` of shape `(batch_size, sequence_length)`.
950
+ use_cache (`bool`, *optional*):
951
+ If set to `True`, `past_key_values` key value states are returned and can be used to speed up decoding (see
952
+ `past_key_values`).
953
+ """
954
+ output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
955
+ output_hidden_states = (
956
+ output_hidden_states if output_hidden_states is not None else self.config.output_hidden_states
957
+ )
958
+ return_dict = return_dict if return_dict is not None else self.config.use_return_dict
959
+
960
+ if self.config.is_decoder:
961
+ use_cache = use_cache if use_cache is not None else self.config.use_cache
962
+ else:
963
+ use_cache = False
964
+
965
+ if input_ids is not None and inputs_embeds is not None:
966
+ raise ValueError("You cannot specify both input_ids and inputs_embeds at the same time")
967
+ elif input_ids is not None:
968
+ input_shape = input_ids.size()
969
+ elif inputs_embeds is not None:
970
+ input_shape = inputs_embeds.size()[:-1]
971
+ else:
972
+ raise ValueError("You have to specify either input_ids or inputs_embeds")
973
+
974
+ batch_size, seq_length = input_shape
975
+ device = input_ids.device if input_ids is not None else inputs_embeds.device
976
+
977
+ # past_key_values_length
978
+ past_key_values_length = past_key_values[0][0].shape[2] if past_key_values is not None else 0
979
+
980
+ if attention_mask is None:
981
+ attention_mask = torch.ones(((batch_size, seq_length + past_key_values_length)), device=device)
982
+
983
+ if token_type_ids is None:
984
+ if hasattr(self.embeddings, "token_type_ids"):
985
+ buffered_token_type_ids = self.embeddings.token_type_ids[:, :seq_length]
986
+ buffered_token_type_ids_expanded = buffered_token_type_ids.expand(batch_size, seq_length)
987
+ token_type_ids = buffered_token_type_ids_expanded
988
+ else:
989
+ token_type_ids = torch.zeros(input_shape, dtype=torch.long, device=device)
990
+
991
+ # We can provide a self-attention mask of dimensions [batch_size, from_seq_length, to_seq_length]
992
+ # ourselves in which case we just need to make it broadcastable to all heads.
993
+ extended_attention_mask: torch.Tensor = self.get_extended_attention_mask(attention_mask, input_shape)
994
+
995
+ # If a 2D or 3D attention mask is provided for the cross-attention
996
+ # we need to make broadcastable to [batch_size, num_heads, seq_length, seq_length]
997
+ if self.config.is_decoder and encoder_hidden_states is not None:
998
+ encoder_batch_size, encoder_sequence_length, _ = encoder_hidden_states.size()
999
+ encoder_hidden_shape = (encoder_batch_size, encoder_sequence_length)
1000
+ if encoder_attention_mask is None:
1001
+ encoder_attention_mask = torch.ones(encoder_hidden_shape, device=device)
1002
+ encoder_extended_attention_mask = self.invert_attention_mask(encoder_attention_mask)
1003
+ else:
1004
+ encoder_extended_attention_mask = None
1005
+
1006
+ # Prepare head mask if needed
1007
+ # 1.0 in head_mask indicate we keep the head
1008
+ # attention_probs has shape bsz x n_heads x N x N
1009
+ # input head_mask has shape [num_heads] or [num_hidden_layers x num_heads]
1010
+ # and head_mask is converted to shape [num_hidden_layers x batch x num_heads x seq_length x seq_length]
1011
+ head_mask = self.get_head_mask(head_mask, self.config.num_hidden_layers)
1012
+
1013
+ embedding_output = self.embeddings(
1014
+ input_ids=input_ids,
1015
+ position_ids=position_ids,
1016
+ token_type_ids=token_type_ids,
1017
+ inputs_embeds=inputs_embeds,
1018
+ past_key_values_length=past_key_values_length,
1019
+ )
1020
+ encoder_outputs = self.encoder(
1021
+ embedding_output,
1022
+ attention_mask=extended_attention_mask,
1023
+ head_mask=head_mask,
1024
+ encoder_hidden_states=encoder_hidden_states,
1025
+ encoder_attention_mask=encoder_extended_attention_mask,
1026
+ past_key_values=past_key_values,
1027
+ use_cache=use_cache,
1028
+ output_attentions=output_attentions,
1029
+ output_hidden_states=output_hidden_states,
1030
+ return_dict=return_dict,
1031
+ )
1032
+ sequence_output = encoder_outputs[0]
1033
+ pooled_output = self.pooler(sequence_output) if self.pooler is not None else None
1034
+
1035
+ if not return_dict:
1036
+ return (sequence_output, pooled_output) + encoder_outputs[1:]
1037
+
1038
+ return BaseModelOutputWithPoolingAndCrossAttentions(
1039
+ last_hidden_state=sequence_output,
1040
+ pooler_output=pooled_output,
1041
+ past_key_values=encoder_outputs.past_key_values,
1042
+ hidden_states=encoder_outputs.hidden_states,
1043
+ attentions=encoder_outputs.attentions,
1044
+ cross_attentions=encoder_outputs.cross_attentions,
1045
+ )
1046
+
1047
+
1048
+ @add_start_docstrings(
1049
+ """
1050
+ Bert Model with two heads on top as done during the pretraining: a `masked language modeling` head and a `next
1051
+ sentence prediction (classification)` head.
1052
+ """,
1053
+ BERT_START_DOCSTRING,
1054
+ )
1055
+ class BertForPreTraining(BertPreTrainedModel):
1056
+ _keys_to_ignore_on_load_missing = [r"position_ids", r"predictions.decoder.bias", r"cls.predictions.decoder.weight"]
1057
+
1058
+ def __init__(self, config):
1059
+ super().__init__(config)
1060
+
1061
+ self.bert = BertModel(config)
1062
+ self.cls = BertPreTrainingHeads(config)
1063
+
1064
+ # Initialize weights and apply final processing
1065
+ self.post_init()
1066
+
1067
+ def get_output_embeddings(self):
1068
+ return self.cls.predictions.decoder
1069
+
1070
+ def set_output_embeddings(self, new_embeddings):
1071
+ self.cls.predictions.decoder = new_embeddings
1072
+
1073
+ @add_start_docstrings_to_model_forward(BERT_INPUTS_DOCSTRING.format("batch_size, sequence_length"))
1074
+ @replace_return_docstrings(output_type=BertForPreTrainingOutput, config_class=_CONFIG_FOR_DOC)
1075
+ def forward(
1076
+ self,
1077
+ input_ids: Optional[torch.Tensor] = None,
1078
+ attention_mask: Optional[torch.Tensor] = None,
1079
+ token_type_ids: Optional[torch.Tensor] = None,
1080
+ position_ids: Optional[torch.Tensor] = None,
1081
+ head_mask: Optional[torch.Tensor] = None,
1082
+ inputs_embeds: Optional[torch.Tensor] = None,
1083
+ labels: Optional[torch.Tensor] = None,
1084
+ next_sentence_label: Optional[torch.Tensor] = None,
1085
+ output_attentions: Optional[bool] = None,
1086
+ output_hidden_states: Optional[bool] = None,
1087
+ return_dict: Optional[bool] = None,
1088
+ ) -> Union[Tuple[torch.Tensor], BertForPreTrainingOutput]:
1089
+ r"""
1090
+ labels (`torch.LongTensor` of shape `(batch_size, sequence_length)`, *optional*):
1091
+ Labels for computing the masked language modeling loss. Indices should be in `[-100, 0, ...,
1092
+ config.vocab_size]` (see `input_ids` docstring) Tokens with indices set to `-100` are ignored (masked),
1093
+ the loss is only computed for the tokens with labels in `[0, ..., config.vocab_size]`
1094
+ next_sentence_label (`torch.LongTensor` of shape `(batch_size,)`, *optional*):
1095
+ Labels for computing the next sequence prediction (classification) loss. Input should be a sequence
1096
+ pair (see `input_ids` docstring) Indices should be in `[0, 1]`:
1097
+
1098
+ - 0 indicates sequence B is a continuation of sequence A,
1099
+ - 1 indicates sequence B is a random sequence.
1100
+ kwargs (`Dict[str, any]`, optional, defaults to *{}*):
1101
+ Used to hide legacy arguments that have been deprecated.
1102
+
1103
+ Returns:
1104
+
1105
+ Example:
1106
+
1107
+ ```python
1108
+ >>> from transformers import AutoTokenizer, BertForPreTraining
1109
+ >>> import torch
1110
+
1111
+ >>> tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
1112
+ >>> model = BertForPreTraining.from_pretrained("bert-base-uncased")
1113
+
1114
+ >>> inputs = tokenizer("Hello, my dog is cute", return_tensors="pt")
1115
+ >>> outputs = model(**inputs)
1116
+
1117
+ >>> prediction_logits = outputs.prediction_logits
1118
+ >>> seq_relationship_logits = outputs.seq_relationship_logits
1119
+ ```
1120
+ """
1121
+ return_dict = return_dict if return_dict is not None else self.config.use_return_dict
1122
+
1123
+ outputs = self.bert(
1124
+ input_ids,
1125
+ attention_mask=attention_mask,
1126
+ token_type_ids=token_type_ids,
1127
+ position_ids=position_ids,
1128
+ head_mask=head_mask,
1129
+ inputs_embeds=inputs_embeds,
1130
+ output_attentions=output_attentions,
1131
+ output_hidden_states=output_hidden_states,
1132
+ return_dict=return_dict,
1133
+ )
1134
+
1135
+ sequence_output, pooled_output = outputs[:2]
1136
+ prediction_scores, seq_relationship_score = self.cls(sequence_output, pooled_output)
1137
+
1138
+ total_loss = None
1139
+ if labels is not None and next_sentence_label is not None:
1140
+ loss_fct = CrossEntropyLoss()
1141
+ masked_lm_loss = loss_fct(prediction_scores.view(-1, self.config.vocab_size), labels.view(-1))
1142
+ next_sentence_loss = loss_fct(seq_relationship_score.view(-1, 2), next_sentence_label.view(-1))
1143
+ total_loss = masked_lm_loss + next_sentence_loss
1144
+
1145
+ if not return_dict:
1146
+ output = (prediction_scores, seq_relationship_score) + outputs[2:]
1147
+ return ((total_loss,) + output) if total_loss is not None else output
1148
+
1149
+ return BertForPreTrainingOutput(
1150
+ loss=total_loss,
1151
+ prediction_logits=prediction_scores,
1152
+ seq_relationship_logits=seq_relationship_score,
1153
+ hidden_states=outputs.hidden_states,
1154
+ attentions=outputs.attentions,
1155
+ )
1156
+
1157
+
1158
+ @add_start_docstrings(
1159
+ """Bert Model with a `language modeling` head on top for CLM fine-tuning.""", BERT_START_DOCSTRING
1160
+ )
1161
+ class BertCustomLMHeadModel(BertPreTrainedModel):
1162
+ _keys_to_ignore_on_load_unexpected = [r"pooler"]
1163
+ _keys_to_ignore_on_load_missing = [r"position_ids", r"predictions.decoder.bias", r"cls.predictions.decoder.weight"]
1164
+
1165
+ def __init__(self, config):
1166
+ super().__init__(config)
1167
+
1168
+ if not config.is_decoder:
1169
+ logger.warning("If you want to use `BertLMHeadModel` as a standalone, add `is_decoder=True.`")
1170
+
1171
+ self.bert = BertModel(config, add_pooling_layer=False)
1172
+ self.cls = BertOnlyMLMHead(config)
1173
+
1174
+ # Initialize weights and apply final processing
1175
+ self.post_init()
1176
+
1177
+ def get_output_embeddings(self):
1178
+ return self.cls.predictions.decoder
1179
+
1180
+ def set_output_embeddings(self, new_embeddings):
1181
+ self.cls.predictions.decoder = new_embeddings
1182
+
1183
+ @add_start_docstrings_to_model_forward(BERT_INPUTS_DOCSTRING.format("batch_size, sequence_length"))
1184
+ @add_code_sample_docstrings(
1185
+ checkpoint=_CHECKPOINT_FOR_DOC,
1186
+ output_type=CausalLMOutputWithCrossAttentions,
1187
+ config_class=_CONFIG_FOR_DOC,
1188
+ )
1189
+ def forward(
1190
+ self,
1191
+ input_ids: Optional[torch.Tensor] = None,
1192
+ attention_mask: Optional[torch.Tensor] = None,
1193
+ token_type_ids: Optional[torch.Tensor] = None,
1194
+ position_ids: Optional[torch.Tensor] = None,
1195
+ head_mask: Optional[torch.Tensor] = None,
1196
+ inputs_embeds: Optional[torch.Tensor] = None,
1197
+ encoder_hidden_states: Optional[torch.Tensor] = None,
1198
+ encoder_attention_mask: Optional[torch.Tensor] = None,
1199
+ labels: Optional[torch.Tensor] = None,
1200
+ past_key_values: Optional[List[torch.Tensor]] = None,
1201
+ use_cache: Optional[bool] = None,
1202
+ output_attentions: Optional[bool] = None,
1203
+ output_hidden_states: Optional[bool] = None,
1204
+ return_dict: Optional[bool] = None,
1205
+ ) -> Union[Tuple[torch.Tensor], CausalLMOutputWithCrossAttentions]:
1206
+ r"""
1207
+ encoder_hidden_states (`torch.FloatTensor` of shape `(batch_size, sequence_length, hidden_size)`, *optional*):
1208
+ Sequence of hidden-states at the output of the last layer of the encoder. Used in the cross-attention if
1209
+ the model is configured as a decoder.
1210
+ encoder_attention_mask (`torch.FloatTensor` of shape `(batch_size, sequence_length)`, *optional*):
1211
+ Mask to avoid performing attention on the padding token indices of the encoder input. This mask is used in
1212
+ the cross-attention if the model is configured as a decoder. Mask values selected in `[0, 1]`:
1213
+
1214
+ - 1 for tokens that are **not masked**,
1215
+ - 0 for tokens that are **masked**.
1216
+ labels (`torch.LongTensor` of shape `(batch_size, sequence_length)`, *optional*):
1217
+ Labels for computing the left-to-right language modeling loss (next word prediction). Indices should be in
1218
+ `[-100, 0, ..., config.vocab_size]` (see `input_ids` docstring) Tokens with indices set to `-100` are
1219
+ ignored (masked), the loss is only computed for the tokens with labels n `[0, ..., config.vocab_size]`
1220
+ past_key_values (`tuple(tuple(torch.FloatTensor))` of length `config.n_layers` with each tuple having 4 tensors of shape `(batch_size, num_heads, sequence_length - 1, embed_size_per_head)`):
1221
+ Contains precomputed key and value hidden states of the attention blocks. Can be used to speed up decoding.
1222
+
1223
+ If `past_key_values` are used, the user can optionally input only the last `decoder_input_ids` (those that
1224
+ don't have their past key value states given to this model) of shape `(batch_size, 1)` instead of all
1225
+ `decoder_input_ids` of shape `(batch_size, sequence_length)`.
1226
+ use_cache (`bool`, *optional*):
1227
+ If set to `True`, `past_key_values` key value states are returned and can be used to speed up decoding (see
1228
+ `past_key_values`).
1229
+ """
1230
+ return_dict = return_dict if return_dict is not None else self.config.use_return_dict
1231
+ if labels is not None:
1232
+ use_cache = False
1233
+
1234
+ outputs = self.bert(
1235
+ input_ids,
1236
+ attention_mask=attention_mask,
1237
+ token_type_ids=token_type_ids,
1238
+ position_ids=position_ids,
1239
+ head_mask=head_mask,
1240
+ inputs_embeds=inputs_embeds,
1241
+ encoder_hidden_states=encoder_hidden_states,
1242
+ encoder_attention_mask=encoder_attention_mask,
1243
+ past_key_values=past_key_values,
1244
+ use_cache=use_cache,
1245
+ output_attentions=output_attentions,
1246
+ output_hidden_states=output_hidden_states,
1247
+ return_dict=return_dict,
1248
+ )
1249
+
1250
+ sequence_output = outputs[0]
1251
+ prediction_scores = self.cls(sequence_output)
1252
+
1253
+ lm_loss = None
1254
+ if labels is not None:
1255
+ # we are doing next-token prediction; shift prediction scores and input ids by one
1256
+ shifted_prediction_scores = prediction_scores[:, :-1, :].contiguous()
1257
+ labels = labels[:, 1:].contiguous()
1258
+ loss_fct = CrossEntropyLoss()
1259
+ lm_loss = loss_fct(shifted_prediction_scores.view(-1, self.config.vocab_size), labels.view(-1))
1260
+
1261
+ if not return_dict:
1262
+ output = (prediction_scores,) + outputs[2:]
1263
+ return ((lm_loss,) + output) if lm_loss is not None else output
1264
+
1265
+ return CausalLMOutputWithCrossAttentions(
1266
+ loss=lm_loss,
1267
+ logits=prediction_scores,
1268
+ past_key_values=outputs.past_key_values,
1269
+ hidden_states=outputs.hidden_states,
1270
+ attentions=outputs.attentions,
1271
+ cross_attentions=outputs.cross_attentions,
1272
+ )
1273
+
1274
+ def prepare_inputs_for_generation(
1275
+ self, input_ids, past_key_values=None, attention_mask=None, use_cache=True, **model_kwargs
1276
+ ):
1277
+ input_shape = input_ids.shape
1278
+ # if model is used as a decoder in encoder-decoder model, the decoder attention mask is created on the fly
1279
+ if attention_mask is None:
1280
+ attention_mask = input_ids.new_ones(input_shape)
1281
+
1282
+ # cut decoder_input_ids if past_key_values is used
1283
+ if past_key_values is not None:
1284
+ input_ids = input_ids[:, -1:]
1285
+
1286
+ return {
1287
+ "input_ids": input_ids,
1288
+ "attention_mask": attention_mask,
1289
+ "past_key_values": past_key_values,
1290
+ "use_cache": use_cache,
1291
+ }
1292
+
1293
+ def _reorder_cache(self, past_key_values, beam_idx):
1294
+ reordered_past = ()
1295
+ for layer_past in past_key_values:
1296
+ reordered_past += (tuple(past_state.index_select(0, beam_idx) for past_state in layer_past),)
1297
+ return reordered_past
1298
+
1299
+
1300
+ @add_start_docstrings("""Bert Model with a `language modeling` head on top.""", BERT_START_DOCSTRING)
1301
+ class BertForMaskedLM(BertPreTrainedModel):
1302
+ _keys_to_ignore_on_load_unexpected = [r"pooler"]
1303
+ _keys_to_ignore_on_load_missing = [r"position_ids", r"predictions.decoder.bias", r"cls.predictions.decoder.weight"]
1304
+
1305
+ def __init__(self, config):
1306
+ super().__init__(config)
1307
+
1308
+ if config.is_decoder:
1309
+ logger.warning(
1310
+ "If you want to use `BertForMaskedLM` make sure `config.is_decoder=False` for "
1311
+ "bi-directional self-attention."
1312
+ )
1313
+
1314
+ self.bert = BertModel(config, add_pooling_layer=False)
1315
+ self.cls = BertOnlyMLMHead(config)
1316
+
1317
+ # Initialize weights and apply final processing
1318
+ self.post_init()
1319
+
1320
+ def get_output_embeddings(self):
1321
+ return self.cls.predictions.decoder
1322
+
1323
+ def set_output_embeddings(self, new_embeddings):
1324
+ self.cls.predictions.decoder = new_embeddings
1325
+
1326
+ @add_start_docstrings_to_model_forward(BERT_INPUTS_DOCSTRING.format("batch_size, sequence_length"))
1327
+ @add_code_sample_docstrings(
1328
+ checkpoint=_CHECKPOINT_FOR_DOC,
1329
+ output_type=MaskedLMOutput,
1330
+ config_class=_CONFIG_FOR_DOC,
1331
+ expected_output="'paris'",
1332
+ expected_loss=0.88,
1333
+ )
1334
+ def forward(
1335
+ self,
1336
+ input_ids: Optional[torch.Tensor] = None,
1337
+ attention_mask: Optional[torch.Tensor] = None,
1338
+ token_type_ids: Optional[torch.Tensor] = None,
1339
+ position_ids: Optional[torch.Tensor] = None,
1340
+ head_mask: Optional[torch.Tensor] = None,
1341
+ inputs_embeds: Optional[torch.Tensor] = None,
1342
+ encoder_hidden_states: Optional[torch.Tensor] = None,
1343
+ encoder_attention_mask: Optional[torch.Tensor] = None,
1344
+ labels: Optional[torch.Tensor] = None,
1345
+ output_attentions: Optional[bool] = None,
1346
+ output_hidden_states: Optional[bool] = None,
1347
+ return_dict: Optional[bool] = None,
1348
+ ) -> Union[Tuple[torch.Tensor], MaskedLMOutput]:
1349
+ r"""
1350
+ labels (`torch.LongTensor` of shape `(batch_size, sequence_length)`, *optional*):
1351
+ Labels for computing the masked language modeling loss. Indices should be in `[-100, 0, ...,
1352
+ config.vocab_size]` (see `input_ids` docstring) Tokens with indices set to `-100` are ignored (masked), the
1353
+ loss is only computed for the tokens with labels in `[0, ..., config.vocab_size]`
1354
+ """
1355
+
1356
+ return_dict = return_dict if return_dict is not None else self.config.use_return_dict
1357
+
1358
+ outputs = self.bert(
1359
+ input_ids,
1360
+ attention_mask=attention_mask,
1361
+ token_type_ids=token_type_ids,
1362
+ position_ids=position_ids,
1363
+ head_mask=head_mask,
1364
+ inputs_embeds=inputs_embeds,
1365
+ encoder_hidden_states=encoder_hidden_states,
1366
+ encoder_attention_mask=encoder_attention_mask,
1367
+ output_attentions=output_attentions,
1368
+ output_hidden_states=output_hidden_states,
1369
+ return_dict=return_dict,
1370
+ )
1371
+
1372
+ sequence_output = outputs[0]
1373
+ prediction_scores = self.cls(sequence_output)
1374
+
1375
+ masked_lm_loss = None
1376
+ if labels is not None:
1377
+ loss_fct = CrossEntropyLoss() # -100 index = padding token
1378
+ masked_lm_loss = loss_fct(prediction_scores.view(-1, self.config.vocab_size), labels.view(-1))
1379
+
1380
+ if not return_dict:
1381
+ output = (prediction_scores,) + outputs[2:]
1382
+ return ((masked_lm_loss,) + output) if masked_lm_loss is not None else output
1383
+
1384
+ return MaskedLMOutput(
1385
+ loss=masked_lm_loss,
1386
+ logits=prediction_scores,
1387
+ hidden_states=outputs.hidden_states,
1388
+ attentions=outputs.attentions,
1389
+ )
1390
+
1391
+ def prepare_inputs_for_generation(self, input_ids, attention_mask=None, **model_kwargs):
1392
+ input_shape = input_ids.shape
1393
+ effective_batch_size = input_shape[0]
1394
+
1395
+ # add a dummy token
1396
+ if self.config.pad_token_id is None:
1397
+ raise ValueError("The PAD token should be defined for generation")
1398
+
1399
+ attention_mask = torch.cat([attention_mask, attention_mask.new_zeros((attention_mask.shape[0], 1))], dim=-1)
1400
+ dummy_token = torch.full(
1401
+ (effective_batch_size, 1), self.config.pad_token_id, dtype=torch.long, device=input_ids.device
1402
+ )
1403
+ input_ids = torch.cat([input_ids, dummy_token], dim=1)
1404
+
1405
+ return {"input_ids": input_ids, "attention_mask": attention_mask}
1406
+
1407
+
1408
+ @add_start_docstrings(
1409
+ """Bert Model with a `next sentence prediction (classification)` head on top.""",
1410
+ BERT_START_DOCSTRING,
1411
+ )
1412
+ class BertForNextSentencePrediction(BertPreTrainedModel):
1413
+ def __init__(self, config):
1414
+ super().__init__(config)
1415
+
1416
+ self.bert = BertModel(config)
1417
+ self.cls = BertOnlyNSPHead(config)
1418
+
1419
+ # Initialize weights and apply final processing
1420
+ self.post_init()
1421
+
1422
+ @add_start_docstrings_to_model_forward(BERT_INPUTS_DOCSTRING.format("batch_size, sequence_length"))
1423
+ @replace_return_docstrings(output_type=NextSentencePredictorOutput, config_class=_CONFIG_FOR_DOC)
1424
+ def forward(
1425
+ self,
1426
+ input_ids: Optional[torch.Tensor] = None,
1427
+ attention_mask: Optional[torch.Tensor] = None,
1428
+ token_type_ids: Optional[torch.Tensor] = None,
1429
+ position_ids: Optional[torch.Tensor] = None,
1430
+ head_mask: Optional[torch.Tensor] = None,
1431
+ inputs_embeds: Optional[torch.Tensor] = None,
1432
+ labels: Optional[torch.Tensor] = None,
1433
+ output_attentions: Optional[bool] = None,
1434
+ output_hidden_states: Optional[bool] = None,
1435
+ return_dict: Optional[bool] = None,
1436
+ **kwargs,
1437
+ ) -> Union[Tuple[torch.Tensor], NextSentencePredictorOutput]:
1438
+ r"""
1439
+ labels (`torch.LongTensor` of shape `(batch_size,)`, *optional*):
1440
+ Labels for computing the next sequence prediction (classification) loss. Input should be a sequence pair
1441
+ (see `input_ids` docstring). Indices should be in `[0, 1]`:
1442
+
1443
+ - 0 indicates sequence B is a continuation of sequence A,
1444
+ - 1 indicates sequence B is a random sequence.
1445
+
1446
+ Returns:
1447
+
1448
+ Example:
1449
+
1450
+ ```python
1451
+ >>> from transformers import AutoTokenizer, BertForNextSentencePrediction
1452
+ >>> import torch
1453
+
1454
+ >>> tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
1455
+ >>> model = BertForNextSentencePrediction.from_pretrained("bert-base-uncased")
1456
+
1457
+ >>> prompt = "In Italy, pizza served in formal settings, such as at a restaurant, is presented unsliced."
1458
+ >>> next_sentence = "The sky is blue due to the shorter wavelength of blue light."
1459
+ >>> encoding = tokenizer(prompt, next_sentence, return_tensors="pt")
1460
+
1461
+ >>> outputs = model(**encoding, labels=torch.LongTensor([1]))
1462
+ >>> logits = outputs.logits
1463
+ >>> assert logits[0, 0] < logits[0, 1] # next sentence was random
1464
+ ```
1465
+ """
1466
+
1467
+ if "next_sentence_label" in kwargs:
1468
+ warnings.warn(
1469
+ "The `next_sentence_label` argument is deprecated and will be removed in a future version, use"
1470
+ " `labels` instead.",
1471
+ FutureWarning,
1472
+ )
1473
+ labels = kwargs.pop("next_sentence_label")
1474
+
1475
+ return_dict = return_dict if return_dict is not None else self.config.use_return_dict
1476
+
1477
+ outputs = self.bert(
1478
+ input_ids,
1479
+ attention_mask=attention_mask,
1480
+ token_type_ids=token_type_ids,
1481
+ position_ids=position_ids,
1482
+ head_mask=head_mask,
1483
+ inputs_embeds=inputs_embeds,
1484
+ output_attentions=output_attentions,
1485
+ output_hidden_states=output_hidden_states,
1486
+ return_dict=return_dict,
1487
+ )
1488
+
1489
+ pooled_output = outputs[1]
1490
+
1491
+ seq_relationship_scores = self.cls(pooled_output)
1492
+
1493
+ next_sentence_loss = None
1494
+ if labels is not None:
1495
+ loss_fct = CrossEntropyLoss()
1496
+ next_sentence_loss = loss_fct(seq_relationship_scores.view(-1, 2), labels.view(-1))
1497
+
1498
+ if not return_dict:
1499
+ output = (seq_relationship_scores,) + outputs[2:]
1500
+ return ((next_sentence_loss,) + output) if next_sentence_loss is not None else output
1501
+
1502
+ return NextSentencePredictorOutput(
1503
+ loss=next_sentence_loss,
1504
+ logits=seq_relationship_scores,
1505
+ hidden_states=outputs.hidden_states,
1506
+ attentions=outputs.attentions,
1507
+ )
1508
+
1509
+
1510
+ @add_start_docstrings(
1511
+ """
1512
+ Bert Model transformer with a sequence classification/regression head on top (a linear layer on top of the pooled
1513
+ output) e.g. for GLUE tasks.
1514
+ """,
1515
+ BERT_START_DOCSTRING,
1516
+ )
1517
+ class BertForSequenceClassification(BertPreTrainedModel):
1518
+ def __init__(self, config):
1519
+ super().__init__(config)
1520
+ self.num_labels = config.num_labels
1521
+ self.config = config
1522
+
1523
+ self.bert = BertModel(config)
1524
+ classifier_dropout = (
1525
+ config.classifier_dropout if config.classifier_dropout is not None else config.hidden_dropout_prob
1526
+ )
1527
+ self.dropout = nn.Dropout(classifier_dropout)
1528
+ self.classifier = nn.Linear(config.hidden_size, config.num_labels)
1529
+
1530
+ # Initialize weights and apply final processing
1531
+ self.post_init()
1532
+
1533
+ @add_start_docstrings_to_model_forward(BERT_INPUTS_DOCSTRING.format("batch_size, sequence_length"))
1534
+ @add_code_sample_docstrings(
1535
+ checkpoint=_CHECKPOINT_FOR_SEQUENCE_CLASSIFICATION,
1536
+ output_type=SequenceClassifierOutput,
1537
+ config_class=_CONFIG_FOR_DOC,
1538
+ expected_output=_SEQ_CLASS_EXPECTED_OUTPUT,
1539
+ expected_loss=_SEQ_CLASS_EXPECTED_LOSS,
1540
+ )
1541
+ def forward(
1542
+ self,
1543
+ input_ids: Optional[torch.Tensor] = None,
1544
+ attention_mask: Optional[torch.Tensor] = None,
1545
+ token_type_ids: Optional[torch.Tensor] = None,
1546
+ position_ids: Optional[torch.Tensor] = None,
1547
+ head_mask: Optional[torch.Tensor] = None,
1548
+ inputs_embeds: Optional[torch.Tensor] = None,
1549
+ labels: Optional[torch.Tensor] = None,
1550
+ output_attentions: Optional[bool] = None,
1551
+ output_hidden_states: Optional[bool] = None,
1552
+ return_dict: Optional[bool] = None,
1553
+ ) -> Union[Tuple[torch.Tensor], SequenceClassifierOutput]:
1554
+ r"""
1555
+ labels (`torch.LongTensor` of shape `(batch_size,)`, *optional*):
1556
+ Labels for computing the sequence classification/regression loss. Indices should be in `[0, ...,
1557
+ config.num_labels - 1]`. If `config.num_labels == 1` a regression loss is computed (Mean-Square loss), If
1558
+ `config.num_labels > 1` a classification loss is computed (Cross-Entropy).
1559
+ """
1560
+ return_dict = return_dict if return_dict is not None else self.config.use_return_dict
1561
+
1562
+ outputs = self.bert(
1563
+ input_ids,
1564
+ attention_mask=attention_mask,
1565
+ token_type_ids=token_type_ids,
1566
+ position_ids=position_ids,
1567
+ head_mask=head_mask,
1568
+ inputs_embeds=inputs_embeds,
1569
+ output_attentions=output_attentions,
1570
+ output_hidden_states=output_hidden_states,
1571
+ return_dict=return_dict,
1572
+ )
1573
+
1574
+ pooled_output = outputs[1]
1575
+
1576
+ pooled_output = self.dropout(pooled_output)
1577
+ logits = self.classifier(pooled_output)
1578
+
1579
+ loss = None
1580
+ if labels is not None:
1581
+ if self.config.problem_type is None:
1582
+ if self.num_labels == 1:
1583
+ self.config.problem_type = "regression"
1584
+ elif self.num_labels > 1 and (labels.dtype == torch.long or labels.dtype == torch.int):
1585
+ self.config.problem_type = "single_label_classification"
1586
+ else:
1587
+ self.config.problem_type = "multi_label_classification"
1588
+
1589
+ if self.config.problem_type == "regression":
1590
+ loss_fct = MSELoss()
1591
+ if self.num_labels == 1:
1592
+ loss = loss_fct(logits.squeeze(), labels.squeeze())
1593
+ else:
1594
+ loss = loss_fct(logits, labels)
1595
+ elif self.config.problem_type == "single_label_classification":
1596
+ loss_fct = CrossEntropyLoss()
1597
+ loss = loss_fct(logits.view(-1, self.num_labels), labels.view(-1))
1598
+ elif self.config.problem_type == "multi_label_classification":
1599
+ loss_fct = BCEWithLogitsLoss()
1600
+ loss = loss_fct(logits, labels)
1601
+ if not return_dict:
1602
+ output = (logits,) + outputs[2:]
1603
+ return ((loss,) + output) if loss is not None else output
1604
+
1605
+ return SequenceClassifierOutput(
1606
+ loss=loss,
1607
+ logits=logits,
1608
+ hidden_states=outputs.hidden_states,
1609
+ attentions=outputs.attentions,
1610
+ )
1611
+
1612
+
1613
+ @add_start_docstrings(
1614
+ """
1615
+ Bert Model with a multiple choice classification head on top (a linear layer on top of the pooled output and a
1616
+ softmax) e.g. for RocStories/SWAG tasks.
1617
+ """,
1618
+ BERT_START_DOCSTRING,
1619
+ )
1620
+ class BertForMultipleChoice(BertPreTrainedModel):
1621
+ def __init__(self, config):
1622
+ super().__init__(config)
1623
+
1624
+ self.bert = BertModel(config)
1625
+ classifier_dropout = (
1626
+ config.classifier_dropout if config.classifier_dropout is not None else config.hidden_dropout_prob
1627
+ )
1628
+ self.dropout = nn.Dropout(classifier_dropout)
1629
+ self.classifier = nn.Linear(config.hidden_size, 1)
1630
+
1631
+ # Initialize weights and apply final processing
1632
+ self.post_init()
1633
+
1634
+ @add_start_docstrings_to_model_forward(BERT_INPUTS_DOCSTRING.format("batch_size, num_choices, sequence_length"))
1635
+ @add_code_sample_docstrings(
1636
+ checkpoint=_CHECKPOINT_FOR_DOC,
1637
+ output_type=MultipleChoiceModelOutput,
1638
+ config_class=_CONFIG_FOR_DOC,
1639
+ )
1640
+ def forward(
1641
+ self,
1642
+ input_ids: Optional[torch.Tensor] = None,
1643
+ attention_mask: Optional[torch.Tensor] = None,
1644
+ token_type_ids: Optional[torch.Tensor] = None,
1645
+ position_ids: Optional[torch.Tensor] = None,
1646
+ head_mask: Optional[torch.Tensor] = None,
1647
+ inputs_embeds: Optional[torch.Tensor] = None,
1648
+ labels: Optional[torch.Tensor] = None,
1649
+ output_attentions: Optional[bool] = None,
1650
+ output_hidden_states: Optional[bool] = None,
1651
+ return_dict: Optional[bool] = None,
1652
+ ) -> Union[Tuple[torch.Tensor], MultipleChoiceModelOutput]:
1653
+ r"""
1654
+ labels (`torch.LongTensor` of shape `(batch_size,)`, *optional*):
1655
+ Labels for computing the multiple choice classification loss. Indices should be in `[0, ...,
1656
+ num_choices-1]` where `num_choices` is the size of the second dimension of the input tensors. (See
1657
+ `input_ids` above)
1658
+ """
1659
+ return_dict = return_dict if return_dict is not None else self.config.use_return_dict
1660
+ num_choices = input_ids.shape[1] if input_ids is not None else inputs_embeds.shape[1]
1661
+
1662
+ input_ids = input_ids.view(-1, input_ids.size(-1)) if input_ids is not None else None
1663
+ attention_mask = attention_mask.view(-1, attention_mask.size(-1)) if attention_mask is not None else None
1664
+ token_type_ids = token_type_ids.view(-1, token_type_ids.size(-1)) if token_type_ids is not None else None
1665
+ position_ids = position_ids.view(-1, position_ids.size(-1)) if position_ids is not None else None
1666
+ inputs_embeds = (
1667
+ inputs_embeds.view(-1, inputs_embeds.size(-2), inputs_embeds.size(-1))
1668
+ if inputs_embeds is not None
1669
+ else None
1670
+ )
1671
+
1672
+ outputs = self.bert(
1673
+ input_ids,
1674
+ attention_mask=attention_mask,
1675
+ token_type_ids=token_type_ids,
1676
+ position_ids=position_ids,
1677
+ head_mask=head_mask,
1678
+ inputs_embeds=inputs_embeds,
1679
+ output_attentions=output_attentions,
1680
+ output_hidden_states=output_hidden_states,
1681
+ return_dict=return_dict,
1682
+ )
1683
+
1684
+ pooled_output = outputs[1]
1685
+
1686
+ pooled_output = self.dropout(pooled_output)
1687
+ logits = self.classifier(pooled_output)
1688
+ reshaped_logits = logits.view(-1, num_choices)
1689
+
1690
+ loss = None
1691
+ if labels is not None:
1692
+ loss_fct = CrossEntropyLoss()
1693
+ loss = loss_fct(reshaped_logits, labels)
1694
+
1695
+ if not return_dict:
1696
+ output = (reshaped_logits,) + outputs[2:]
1697
+ return ((loss,) + output) if loss is not None else output
1698
+
1699
+ return MultipleChoiceModelOutput(
1700
+ loss=loss,
1701
+ logits=reshaped_logits,
1702
+ hidden_states=outputs.hidden_states,
1703
+ attentions=outputs.attentions,
1704
+ )
1705
+
1706
+
1707
+ @add_start_docstrings(
1708
+ """
1709
+ Bert Model with a token classification head on top (a linear layer on top of the hidden-states output) e.g. for
1710
+ Named-Entity-Recognition (NER) tasks.
1711
+ """,
1712
+ BERT_START_DOCSTRING,
1713
+ )
1714
+ class BertForTokenClassification(BertPreTrainedModel):
1715
+ _keys_to_ignore_on_load_unexpected = [r"pooler"]
1716
+
1717
+ def __init__(self, config):
1718
+ super().__init__(config)
1719
+ self.num_labels = config.num_labels
1720
+
1721
+ self.bert = BertModel(config, add_pooling_layer=False)
1722
+ classifier_dropout = (
1723
+ config.classifier_dropout if config.classifier_dropout is not None else config.hidden_dropout_prob
1724
+ )
1725
+ self.dropout = nn.Dropout(classifier_dropout)
1726
+ self.classifier = nn.Linear(config.hidden_size, config.num_labels)
1727
+
1728
+ # Initialize weights and apply final processing
1729
+ self.post_init()
1730
+
1731
+ @add_start_docstrings_to_model_forward(BERT_INPUTS_DOCSTRING.format("batch_size, sequence_length"))
1732
+ @add_code_sample_docstrings(
1733
+ checkpoint=_CHECKPOINT_FOR_TOKEN_CLASSIFICATION,
1734
+ output_type=TokenClassifierOutput,
1735
+ config_class=_CONFIG_FOR_DOC,
1736
+ expected_output=_TOKEN_CLASS_EXPECTED_OUTPUT,
1737
+ expected_loss=_TOKEN_CLASS_EXPECTED_LOSS,
1738
+ )
1739
+ def forward(
1740
+ self,
1741
+ input_ids: Optional[torch.Tensor] = None,
1742
+ attention_mask: Optional[torch.Tensor] = None,
1743
+ token_type_ids: Optional[torch.Tensor] = None,
1744
+ position_ids: Optional[torch.Tensor] = None,
1745
+ head_mask: Optional[torch.Tensor] = None,
1746
+ inputs_embeds: Optional[torch.Tensor] = None,
1747
+ labels: Optional[torch.Tensor] = None,
1748
+ output_attentions: Optional[bool] = None,
1749
+ output_hidden_states: Optional[bool] = None,
1750
+ return_dict: Optional[bool] = None,
1751
+ ) -> Union[Tuple[torch.Tensor], TokenClassifierOutput]:
1752
+ r"""
1753
+ labels (`torch.LongTensor` of shape `(batch_size, sequence_length)`, *optional*):
1754
+ Labels for computing the token classification loss. Indices should be in `[0, ..., config.num_labels - 1]`.
1755
+ """
1756
+ return_dict = return_dict if return_dict is not None else self.config.use_return_dict
1757
+
1758
+ outputs = self.bert(
1759
+ input_ids,
1760
+ attention_mask=attention_mask,
1761
+ token_type_ids=token_type_ids,
1762
+ position_ids=position_ids,
1763
+ head_mask=head_mask,
1764
+ inputs_embeds=inputs_embeds,
1765
+ output_attentions=output_attentions,
1766
+ output_hidden_states=output_hidden_states,
1767
+ return_dict=return_dict,
1768
+ )
1769
+
1770
+ sequence_output = outputs[0]
1771
+
1772
+ sequence_output = self.dropout(sequence_output)
1773
+ logits = self.classifier(sequence_output)
1774
+
1775
+ loss = None
1776
+ if labels is not None:
1777
+ loss_fct = CrossEntropyLoss()
1778
+ loss = loss_fct(logits.view(-1, self.num_labels), labels.view(-1))
1779
+
1780
+ if not return_dict:
1781
+ output = (logits,) + outputs[2:]
1782
+ return ((loss,) + output) if loss is not None else output
1783
+
1784
+ return TokenClassifierOutput(
1785
+ loss=loss,
1786
+ logits=logits,
1787
+ hidden_states=outputs.hidden_states,
1788
+ attentions=outputs.attentions,
1789
+ )
1790
+
1791
+
1792
+ @add_start_docstrings(
1793
+ """
1794
+ Bert Model with a span classification head on top for extractive question-answering tasks like SQuAD (a linear
1795
+ layers on top of the hidden-states output to compute `span start logits` and `span end logits`).
1796
+ """,
1797
+ BERT_START_DOCSTRING,
1798
+ )
1799
+ class BertForQuestionAnswering(BertPreTrainedModel):
1800
+ _keys_to_ignore_on_load_unexpected = [r"pooler"]
1801
+
1802
+ def __init__(self, config):
1803
+ super().__init__(config)
1804
+ self.num_labels = config.num_labels
1805
+
1806
+ self.bert = BertModel(config, add_pooling_layer=False)
1807
+ self.qa_outputs = nn.Linear(config.hidden_size, config.num_labels)
1808
+
1809
+ # Initialize weights and apply final processing
1810
+ self.post_init()
1811
+
1812
+ @add_start_docstrings_to_model_forward(BERT_INPUTS_DOCSTRING.format("batch_size, sequence_length"))
1813
+ @add_code_sample_docstrings(
1814
+ checkpoint=_CHECKPOINT_FOR_QA,
1815
+ output_type=QuestionAnsweringModelOutput,
1816
+ config_class=_CONFIG_FOR_DOC,
1817
+ qa_target_start_index=_QA_TARGET_START_INDEX,
1818
+ qa_target_end_index=_QA_TARGET_END_INDEX,
1819
+ expected_output=_QA_EXPECTED_OUTPUT,
1820
+ expected_loss=_QA_EXPECTED_LOSS,
1821
+ )
1822
+ def forward(
1823
+ self,
1824
+ input_ids: Optional[torch.Tensor] = None,
1825
+ attention_mask: Optional[torch.Tensor] = None,
1826
+ token_type_ids: Optional[torch.Tensor] = None,
1827
+ position_ids: Optional[torch.Tensor] = None,
1828
+ head_mask: Optional[torch.Tensor] = None,
1829
+ inputs_embeds: Optional[torch.Tensor] = None,
1830
+ start_positions: Optional[torch.Tensor] = None,
1831
+ end_positions: Optional[torch.Tensor] = None,
1832
+ output_attentions: Optional[bool] = None,
1833
+ output_hidden_states: Optional[bool] = None,
1834
+ return_dict: Optional[bool] = None,
1835
+ ) -> Union[Tuple[torch.Tensor], QuestionAnsweringModelOutput]:
1836
+ r"""
1837
+ start_positions (`torch.LongTensor` of shape `(batch_size,)`, *optional*):
1838
+ Labels for position (index) of the start of the labelled span for computing the token classification loss.
1839
+ Positions are clamped to the length of the sequence (`sequence_length`). Position outside of the sequence
1840
+ are not taken into account for computing the loss.
1841
+ end_positions (`torch.LongTensor` of shape `(batch_size,)`, *optional*):
1842
+ Labels for position (index) of the end of the labelled span for computing the token classification loss.
1843
+ Positions are clamped to the length of the sequence (`sequence_length`). Position outside of the sequence
1844
+ are not taken into account for computing the loss.
1845
+ """
1846
+ return_dict = return_dict if return_dict is not None else self.config.use_return_dict
1847
+
1848
+ outputs = self.bert(
1849
+ input_ids,
1850
+ attention_mask=attention_mask,
1851
+ token_type_ids=token_type_ids,
1852
+ position_ids=position_ids,
1853
+ head_mask=head_mask,
1854
+ inputs_embeds=inputs_embeds,
1855
+ output_attentions=output_attentions,
1856
+ output_hidden_states=output_hidden_states,
1857
+ return_dict=return_dict,
1858
+ )
1859
+
1860
+ sequence_output = outputs[0]
1861
+
1862
+ logits = self.qa_outputs(sequence_output)
1863
+ start_logits, end_logits = logits.split(1, dim=-1)
1864
+ start_logits = start_logits.squeeze(-1).contiguous()
1865
+ end_logits = end_logits.squeeze(-1).contiguous()
1866
+
1867
+ total_loss = None
1868
+ if start_positions is not None and end_positions is not None:
1869
+ # If we are on multi-GPU, split add a dimension
1870
+ if len(start_positions.size()) > 1:
1871
+ start_positions = start_positions.squeeze(-1)
1872
+ if len(end_positions.size()) > 1:
1873
+ end_positions = end_positions.squeeze(-1)
1874
+ # sometimes the start/end positions are outside our model inputs, we ignore these terms
1875
+ ignored_index = start_logits.size(1)
1876
+ start_positions = start_positions.clamp(0, ignored_index)
1877
+ end_positions = end_positions.clamp(0, ignored_index)
1878
+
1879
+ loss_fct = CrossEntropyLoss(ignore_index=ignored_index)
1880
+ start_loss = loss_fct(start_logits, start_positions)
1881
+ end_loss = loss_fct(end_logits, end_positions)
1882
+ total_loss = (start_loss + end_loss) / 2
1883
+
1884
+ if not return_dict:
1885
+ output = (start_logits, end_logits) + outputs[2:]
1886
+ return ((total_loss,) + output) if total_loss is not None else output
1887
+
1888
+ return QuestionAnsweringModelOutput(
1889
+ loss=total_loss,
1890
+ start_logits=start_logits,
1891
+ end_logits=end_logits,
1892
+ hidden_states=outputs.hidden_states,
1893
+ attentions=outputs.attentions,
1894
+ )
pytorch_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:545d8feae7cdaa752dfcecd8d480928b31a0f7a0b494877c9ab5ddf504906703
3
+ size 383481
special_tokens_map.json ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ {
2
+ "cls_token": "[CLS]",
3
+ "mask_token": "[MASK]",
4
+ "pad_token": "[PAD]",
5
+ "sep_token": "[SEP]",
6
+ "unk_token": "[UNK]"
7
+ }
tf_model.h5 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8961e0116b64f7aa000cdee56f226922e47168126dfc846a85b935b259311edf
3
+ size 472416
tokenizer.json ADDED
@@ -0,0 +1,1274 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "version": "1.0",
3
+ "truncation": null,
4
+ "padding": null,
5
+ "added_tokens": [
6
+ {
7
+ "id": 0,
8
+ "content": "[PAD]",
9
+ "single_word": false,
10
+ "lstrip": false,
11
+ "rstrip": false,
12
+ "normalized": false,
13
+ "special": true
14
+ },
15
+ {
16
+ "id": 1,
17
+ "content": "[UNK]",
18
+ "single_word": false,
19
+ "lstrip": false,
20
+ "rstrip": false,
21
+ "normalized": false,
22
+ "special": true
23
+ },
24
+ {
25
+ "id": 2,
26
+ "content": "[CLS]",
27
+ "single_word": false,
28
+ "lstrip": false,
29
+ "rstrip": false,
30
+ "normalized": false,
31
+ "special": true
32
+ },
33
+ {
34
+ "id": 3,
35
+ "content": "[SEP]",
36
+ "single_word": false,
37
+ "lstrip": false,
38
+ "rstrip": false,
39
+ "normalized": false,
40
+ "special": true
41
+ },
42
+ {
43
+ "id": 4,
44
+ "content": "[MASK]",
45
+ "single_word": false,
46
+ "lstrip": false,
47
+ "rstrip": false,
48
+ "normalized": false,
49
+ "special": true
50
+ }
51
+ ],
52
+ "normalizer": {
53
+ "type": "BertNormalizer",
54
+ "clean_text": true,
55
+ "handle_chinese_chars": true,
56
+ "strip_accents": null,
57
+ "lowercase": true
58
+ },
59
+ "pre_tokenizer": {
60
+ "type": "BertPreTokenizer"
61
+ },
62
+ "post_processor": {
63
+ "type": "TemplateProcessing",
64
+ "single": [
65
+ {
66
+ "SpecialToken": {
67
+ "id": "[CLS]",
68
+ "type_id": 0
69
+ }
70
+ },
71
+ {
72
+ "Sequence": {
73
+ "id": "A",
74
+ "type_id": 0
75
+ }
76
+ },
77
+ {
78
+ "SpecialToken": {
79
+ "id": "[SEP]",
80
+ "type_id": 0
81
+ }
82
+ }
83
+ ],
84
+ "pair": [
85
+ {
86
+ "SpecialToken": {
87
+ "id": "[CLS]",
88
+ "type_id": 0
89
+ }
90
+ },
91
+ {
92
+ "Sequence": {
93
+ "id": "A",
94
+ "type_id": 0
95
+ }
96
+ },
97
+ {
98
+ "SpecialToken": {
99
+ "id": "[SEP]",
100
+ "type_id": 0
101
+ }
102
+ },
103
+ {
104
+ "Sequence": {
105
+ "id": "B",
106
+ "type_id": 1
107
+ }
108
+ },
109
+ {
110
+ "SpecialToken": {
111
+ "id": "[SEP]",
112
+ "type_id": 1
113
+ }
114
+ }
115
+ ],
116
+ "special_tokens": {
117
+ "[CLS]": {
118
+ "id": "[CLS]",
119
+ "ids": [
120
+ 2
121
+ ],
122
+ "tokens": [
123
+ "[CLS]"
124
+ ]
125
+ },
126
+ "[SEP]": {
127
+ "id": "[SEP]",
128
+ "ids": [
129
+ 3
130
+ ],
131
+ "tokens": [
132
+ "[SEP]"
133
+ ]
134
+ }
135
+ }
136
+ },
137
+ "decoder": {
138
+ "type": "WordPiece",
139
+ "prefix": "##",
140
+ "cleanup": true
141
+ },
142
+ "model": {
143
+ "type": "WordPiece",
144
+ "unk_token": "[UNK]",
145
+ "continuing_subword_prefix": "##",
146
+ "max_input_chars_per_word": 100,
147
+ "vocab": {
148
+ "[PAD]": 0,
149
+ "[UNK]": 1,
150
+ "[CLS]": 2,
151
+ "[SEP]": 3,
152
+ "[MASK]": 4,
153
+ "!": 5,
154
+ "\"": 6,
155
+ "#": 7,
156
+ "$": 8,
157
+ "%": 9,
158
+ "&": 10,
159
+ "'": 11,
160
+ "(": 12,
161
+ ")": 13,
162
+ "*": 14,
163
+ "+": 15,
164
+ ",": 16,
165
+ "-": 17,
166
+ ".": 18,
167
+ "/": 19,
168
+ "0": 20,
169
+ "1": 21,
170
+ "2": 22,
171
+ "3": 23,
172
+ "4": 24,
173
+ "5": 25,
174
+ "6": 26,
175
+ "7": 27,
176
+ "8": 28,
177
+ "9": 29,
178
+ ":": 30,
179
+ ";": 31,
180
+ "<": 32,
181
+ "=": 33,
182
+ ">": 34,
183
+ "?": 35,
184
+ "@": 36,
185
+ "[": 37,
186
+ "\\": 38,
187
+ "]": 39,
188
+ "^": 40,
189
+ "_": 41,
190
+ "`": 42,
191
+ "a": 43,
192
+ "b": 44,
193
+ "c": 45,
194
+ "d": 46,
195
+ "e": 47,
196
+ "f": 48,
197
+ "g": 49,
198
+ "h": 50,
199
+ "i": 51,
200
+ "j": 52,
201
+ "k": 53,
202
+ "l": 54,
203
+ "m": 55,
204
+ "n": 56,
205
+ "o": 57,
206
+ "p": 58,
207
+ "q": 59,
208
+ "r": 60,
209
+ "s": 61,
210
+ "t": 62,
211
+ "u": 63,
212
+ "v": 64,
213
+ "w": 65,
214
+ "x": 66,
215
+ "y": 67,
216
+ "z": 68,
217
+ "|": 69,
218
+ "}": 70,
219
+ "~": 71,
220
+ "¡": 72,
221
+ "¢": 73,
222
+ "£": 74,
223
+ "¥": 75,
224
+ "§": 76,
225
+ "°": 77,
226
+ "±": 78,
227
+ "²": 79,
228
+ "³": 80,
229
+ "´": 81,
230
+ "µ": 82,
231
+ "·": 83,
232
+ "º": 84,
233
+ "½": 85,
234
+ "¿": 86,
235
+ "×": 87,
236
+ "ß": 88,
237
+ "æ": 89,
238
+ "ð": 90,
239
+ "ø": 91,
240
+ "þ": 92,
241
+ "đ": 93,
242
+ "ħ": 94,
243
+ "ı": 95,
244
+ "ł": 96,
245
+ "œ": 97,
246
+ "ɐ": 98,
247
+ "ɑ": 99,
248
+ "ɒ": 100,
249
+ "ɔ": 101,
250
+ "ə": 102,
251
+ "ɛ": 103,
252
+ "ɜ": 104,
253
+ "ɡ": 105,
254
+ "ɢ": 106,
255
+ "ɪ": 107,
256
+ "ɫ": 108,
257
+ "ɳ": 109,
258
+ "ɽ": 110,
259
+ "ɾ": 111,
260
+ "ʁ": 112,
261
+ "ʃ": 113,
262
+ "ʊ": 114,
263
+ "ʋ": 115,
264
+ "ʒ": 116,
265
+ "ʔ": 117,
266
+ "ʕ": 118,
267
+ "ʲ": 119,
268
+ "ʻ": 120,
269
+ "ʼ": 121,
270
+ "ʾ": 122,
271
+ "ʿ": 123,
272
+ "ˈ": 124,
273
+ "ˌ": 125,
274
+ "ː": 126,
275
+ "α": 127,
276
+ "β": 128,
277
+ "γ": 129,
278
+ "δ": 130,
279
+ "ε": 131,
280
+ "η": 132,
281
+ "θ": 133,
282
+ "ι": 134,
283
+ "κ": 135,
284
+ "λ": 136,
285
+ "μ": 137,
286
+ "��": 138,
287
+ "ξ": 139,
288
+ "ο": 140,
289
+ "π": 141,
290
+ "ρ": 142,
291
+ "ς": 143,
292
+ "σ": 144,
293
+ "τ": 145,
294
+ "υ": 146,
295
+ "φ": 147,
296
+ "χ": 148,
297
+ "ψ": 149,
298
+ "ω": 150,
299
+ "а": 151,
300
+ "б": 152,
301
+ "в": 153,
302
+ "г": 154,
303
+ "д": 155,
304
+ "е": 156,
305
+ "ж": 157,
306
+ "з": 158,
307
+ "и": 159,
308
+ "к": 160,
309
+ "л": 161,
310
+ "м": 162,
311
+ "н": 163,
312
+ "о": 164,
313
+ "п": 165,
314
+ "р": 166,
315
+ "с": 167,
316
+ "т": 168,
317
+ "у": 169,
318
+ "х": 170,
319
+ "ц": 171,
320
+ "ш": 172,
321
+ "ъ": 173,
322
+ "ы": 174,
323
+ "ь": 175,
324
+ "ю": 176,
325
+ "я": 177,
326
+ "є": 178,
327
+ "א": 179,
328
+ "ב": 180,
329
+ "ג": 181,
330
+ "ה": 182,
331
+ "ו": 183,
332
+ "ז": 184,
333
+ "ח": 185,
334
+ "י": 186,
335
+ "ל": 187,
336
+ "ם": 188,
337
+ "מ": 189,
338
+ "ן": 190,
339
+ "נ": 191,
340
+ "ס": 192,
341
+ "ף": 193,
342
+ "פ": 194,
343
+ "צ": 195,
344
+ "ר": 196,
345
+ "ש": 197,
346
+ "ת": 198,
347
+ "ء": 199,
348
+ "ا": 200,
349
+ "ب": 201,
350
+ "ة": 202,
351
+ "ت": 203,
352
+ "ث": 204,
353
+ "ج": 205,
354
+ "ح": 206,
355
+ "خ": 207,
356
+ "د": 208,
357
+ "ذ": 209,
358
+ "ر": 210,
359
+ "س": 211,
360
+ "ش": 212,
361
+ "ص": 213,
362
+ "ع": 214,
363
+ "ف": 215,
364
+ "ق": 216,
365
+ "ك": 217,
366
+ "ل": 218,
367
+ "م": 219,
368
+ "ن": 220,
369
+ "ه": 221,
370
+ "و": 222,
371
+ "ي": 223,
372
+ "ܐ": 224,
373
+ "ܕ": 225,
374
+ "ܗ": 226,
375
+ "ܝ": 227,
376
+ "ܠ": 228,
377
+ "ܢ": 229,
378
+ "ܬ": 230,
379
+ "अ": 231,
380
+ "ई": 232,
381
+ "क": 233,
382
+ "ग": 234,
383
+ "ण": 235,
384
+ "त": 236,
385
+ "द": 237,
386
+ "न": 238,
387
+ "प": 239,
388
+ "ब": 240,
389
+ "म": 241,
390
+ "य": 242,
391
+ "र": 243,
392
+ "ल": 244,
393
+ "व": 245,
394
+ "स": 246,
395
+ "ह": 247,
396
+ "ा": 248,
397
+ "ि": 249,
398
+ "আ": 250,
399
+ "ল": 251,
400
+ "হ": 252,
401
+ "া": 253,
402
+ "ਅ": 254,
403
+ "ਲ": 255,
404
+ "ਹ": 256,
405
+ "ਾ": 257,
406
+ "അ": 258,
407
+ "ള": 259,
408
+ "ഹ": 260,
409
+ "ാ": 261,
410
+ "ก": 262,
411
+ "ค": 263,
412
+ "ง": 264,
413
+ "ช": 265,
414
+ "ซ": 266,
415
+ "ญ": 267,
416
+ "ฐ": 268,
417
+ "ณ": 269,
418
+ "ด": 270,
419
+ "ต": 271,
420
+ "น": 272,
421
+ "บ": 273,
422
+ "ป": 274,
423
+ "พ": 275,
424
+ "ภ": 276,
425
+ "ม": 277,
426
+ "ย": 278,
427
+ "ร": 279,
428
+ "ล": 280,
429
+ "ว": 281,
430
+ "ศ": 282,
431
+ "ษ": 283,
432
+ "ส": 284,
433
+ "ห": 285,
434
+ "อ": 286,
435
+ "ฮ": 287,
436
+ "ะ": 288,
437
+ "า": 289,
438
+ "เ": 290,
439
+ "แ": 291,
440
+ "ไ": 292,
441
+ "ა": 293,
442
+ "ბ": 294,
443
+ "გ": 295,
444
+ "დ": 296,
445
+ "ე": 297,
446
+ "ვ": 298,
447
+ "ზ": 299,
448
+ "თ": 300,
449
+ "ი": 301,
450
+ "კ": 302,
451
+ "ლ": 303,
452
+ "მ": 304,
453
+ "ნ": 305,
454
+ "ო": 306,
455
+ "პ": 307,
456
+ "ჟ": 308,
457
+ "რ": 309,
458
+ "ს": 310,
459
+ "ტ": 311,
460
+ "უ": 312,
461
+ "ფ": 313,
462
+ "ქ": 314,
463
+ "ღ": 315,
464
+ "ყ": 316,
465
+ "შ": 317,
466
+ "ჩ": 318,
467
+ "ც": 319,
468
+ "ძ": 320,
469
+ "წ": 321,
470
+ "ჭ": 322,
471
+ "ხ": 323,
472
+ "ჯ": 324,
473
+ "ჰ": 325,
474
+ "ჱ": 326,
475
+ "ჲ": 327,
476
+ "ჳ": 328,
477
+ "ჴ": 329,
478
+ "ჵ": 330,
479
+ "ჶ": 331,
480
+ "ჷ": 332,
481
+ "ჸ": 333,
482
+ "ჹ": 334,
483
+ "ჺ": 335,
484
+ "჻": 336,
485
+ "ᄃ": 337,
486
+ "ᄅ": 338,
487
+ "ᄇ": 339,
488
+ "ᄋ": 340,
489
+ "ᄌ": 341,
490
+ "ᅡ": 342,
491
+ "ᅢ": 343,
492
+ "ᅦ": 344,
493
+ "ᅧ": 345,
494
+ "ᅩ": 346,
495
+ "ᅮ": 347,
496
+ "ᅵ": 348,
497
+ "ᆨ": 349,
498
+ "ᆫ": 350,
499
+ "ᆯ": 351,
500
+ "ᆸ": 352,
501
+ "ᆼ": 353,
502
+ "ᵻ": 354,
503
+ "‐": 355,
504
+ "‑": 356,
505
+ "–": 357,
506
+ "—": 358,
507
+ "―": 359,
508
+ "‘": 360,
509
+ "’": 361,
510
+ "“": 362,
511
+ "”": 363,
512
+ "„": 364,
513
+ "†": 365,
514
+ "‡": 366,
515
+ "•": 367,
516
+ "…": 368,
517
+ "′": 369,
518
+ "″": 370,
519
+ "⁄": 371,
520
+ "₣": 372,
521
+ "₤": 373,
522
+ "€": 374,
523
+ "₹": 375,
524
+ "⅓": 376,
525
+ "⅔": 377,
526
+ "→": 378,
527
+ "−": 379,
528
+ "≡": 380,
529
+ "≤": 381,
530
+ "①": 382,
531
+ "☉": 383,
532
+ "☫": 384,
533
+ "♀": 385,
534
+ "♭": 386,
535
+ "♯": 387,
536
+ "⚳": 388,
537
+ "ⴀ": 389,
538
+ "ⴂ": 390,
539
+ "ⴃ": 391,
540
+ "ⴈ": 392,
541
+ "ⴌ": 393,
542
+ "ⴕ": 394,
543
+ "ⴟ": 395,
544
+ "〈": 396,
545
+ "〉": 397,
546
+ "〜": 398,
547
+ "あ": 399,
548
+ "い": 400,
549
+ "う": 401,
550
+ "お": 402,
551
+ "か": 403,
552
+ "き": 404,
553
+ "く": 405,
554
+ "け": 406,
555
+ "こ": 407,
556
+ "さ": 408,
557
+ "し": 409,
558
+ "す": 410,
559
+ "せ": 411,
560
+ "た": 412,
561
+ "ち": 413,
562
+ "っ": 414,
563
+ "つ": 415,
564
+ "と": 416,
565
+ "な": 417,
566
+ "に": 418,
567
+ "の": 419,
568
+ "は": 420,
569
+ "ひ": 421,
570
+ "ふ": 422,
571
+ "ほ": 423,
572
+ "ま": 424,
573
+ "み": 425,
574
+ "め": 426,
575
+ "も": 427,
576
+ "ゃ": 428,
577
+ "ゆ": 429,
578
+ "ょ": 430,
579
+ "ら": 431,
580
+ "り": 432,
581
+ "る": 433,
582
+ "れ": 434,
583
+ "わ": 435,
584
+ "を": 436,
585
+ "ん": 437,
586
+ "ァ": 438,
587
+ "ア": 439,
588
+ "ィ": 440,
589
+ "イ": 441,
590
+ "ゥ": 442,
591
+ "ウ": 443,
592
+ "ェ": 444,
593
+ "エ": 445,
594
+ "ォ": 446,
595
+ "オ": 447,
596
+ "カ": 448,
597
+ "キ": 449,
598
+ "ク": 450,
599
+ "ケ": 451,
600
+ "コ": 452,
601
+ "サ": 453,
602
+ "シ": 454,
603
+ "ス": 455,
604
+ "セ": 456,
605
+ "タ": 457,
606
+ "チ": 458,
607
+ "ッ": 459,
608
+ "ツ": 460,
609
+ "テ": 461,
610
+ "ト": 462,
611
+ "ナ": 463,
612
+ "ニ": 464,
613
+ "ネ": 465,
614
+ "ノ": 466,
615
+ "ハ": 467,
616
+ "フ": 468,
617
+ "ヘ": 469,
618
+ "マ": 470,
619
+ "ミ": 471,
620
+ "ム": 472,
621
+ "モ": 473,
622
+ "ャ": 474,
623
+ "ュ": 475,
624
+ "ョ": 476,
625
+ "ラ": 477,
626
+ "リ": 478,
627
+ "ル": 479,
628
+ "レ": 480,
629
+ "ロ": 481,
630
+ "ン": 482,
631
+ "・": 483,
632
+ "ー": 484,
633
+ "一": 485,
634
+ "七": 486,
635
+ "下": 487,
636
+ "世": 488,
637
+ "丙": 489,
638
+ "中": 490,
639
+ "主": 491,
640
+ "乃": 492,
641
+ "之": 493,
642
+ "乙": 494,
643
+ "九": 495,
644
+ "二": 496,
645
+ "云": 497,
646
+ "人": 498,
647
+ "今": 499,
648
+ "付": 500,
649
+ "作": 501,
650
+ "侗": 502,
651
+ "依": 503,
652
+ "信": 504,
653
+ "傳": 505,
654
+ "儚": 506,
655
+ "充": 507,
656
+ "光": 508,
657
+ "全": 509,
658
+ "兵": 510,
659
+ "其": 511,
660
+ "具": 512,
661
+ "円": 513,
662
+ "再": 514,
663
+ "出": 515,
664
+ "判": 516,
665
+ "前": 517,
666
+ "剛": 518,
667
+ "劇": 519,
668
+ "劉": 520,
669
+ "動": 521,
670
+ "化": 522,
671
+ "北": 523,
672
+ "华": 524,
673
+ "厂": 525,
674
+ "去": 526,
675
+ "古": 527,
676
+ "可": 528,
677
+ "台": 529,
678
+ "史": 530,
679
+ "同": 531,
680
+ "名": 532,
681
+ "君": 533,
682
+ "吳": 534,
683
+ "周": 535,
684
+ "命": 536,
685
+ "和": 537,
686
+ "咲": 538,
687
+ "善": 539,
688
+ "四": 540,
689
+ "國": 541,
690
+ "園": 542,
691
+ "圣": 543,
692
+ "在": 544,
693
+ "坂": 545,
694
+ "堤": 546,
695
+ "場": 547,
696
+ "塘": 548,
697
+ "夕": 549,
698
+ "大": 550,
699
+ "天": 551,
700
+ "夫": 552,
701
+ "女": 553,
702
+ "妙": 554,
703
+ "姚": 555,
704
+ "子": 556,
705
+ "孟": 557,
706
+ "守": 558,
707
+ "安": 559,
708
+ "宋": 560,
709
+ "完": 561,
710
+ "宗": 562,
711
+ "宝": 563,
712
+ "宫": 564,
713
+ "寝": 565,
714
+ "寺": 566,
715
+ "小": 567,
716
+ "少": 568,
717
+ "尾": 569,
718
+ "山": 570,
719
+ "岳": 571,
720
+ "川": 572,
721
+ "州": 573,
722
+ "巳": 574,
723
+ "市": 575,
724
+ "師": 576,
725
+ "平": 577,
726
+ "广": 578,
727
+ "庆": 579,
728
+ "府": 580,
729
+ "座": 581,
730
+ "廬": 582,
731
+ "建": 583,
732
+ "式": 584,
733
+ "張": 585,
734
+ "彌": 586,
735
+ "彩": 587,
736
+ "彼": 588,
737
+ "後": 589,
738
+ "御": 590,
739
+ "德": 591,
740
+ "思": 592,
741
+ "愛": 593,
742
+ "憑": 594,
743
+ "憶": 595,
744
+ "應": 596,
745
+ "懷": 597,
746
+ "战": 598,
747
+ "戦": 599,
748
+ "扈": 600,
749
+ "技": 601,
750
+ "拉": 602,
751
+ "拳": 603,
752
+ "挑": 604,
753
+ "揺": 605,
754
+ "攻": 606,
755
+ "放": 607,
756
+ "政": 608,
757
+ "散": 609,
758
+ "斯": 610,
759
+ "方": 611,
760
+ "日": 612,
761
+ "旦": 613,
762
+ "旭": 614,
763
+ "昌": 615,
764
+ "明": 616,
765
+ "星": 617,
766
+ "春": 618,
767
+ "晋": 619,
768
+ "景": 620,
769
+ "曦": 621,
770
+ "月": 622,
771
+ "望": 623,
772
+ "未": 624,
773
+ "本": 625,
774
+ "李": 626,
775
+ "村": 627,
776
+ "杜": 628,
777
+ "束": 629,
778
+ "来": 630,
779
+ "林": 631,
780
+ "桜": 632,
781
+ "梶": 633,
782
+ "棘": 634,
783
+ "椎": 635,
784
+ "楊": 636,
785
+ "楚": 637,
786
+ "榮": 638,
787
+ "橘": 639,
788
+ "機": 640,
789
+ "正": 641,
790
+ "殻": 642,
791
+ "殿": 643,
792
+ "母": 644,
793
+ "水": 645,
794
+ "汉": 646,
795
+ "沂": 647,
796
+ "沙": 648,
797
+ "河": 649,
798
+ "泗": 650,
799
+ "波": 651,
800
+ "泣": 652,
801
+ "洪": 653,
802
+ "淹": 654,
803
+ "清": 655,
804
+ "湯": 656,
805
+ "漢": 657,
806
+ "澄": 658,
807
+ "澤": 659,
808
+ "火": 660,
809
+ "灯": 661,
810
+ "灵": 662,
811
+ "灼": 663,
812
+ "焼": 664,
813
+ "熱": 665,
814
+ "物": 666,
815
+ "狐": 667,
816
+ "狸": 668,
817
+ "玄": 669,
818
+ "王": 670,
819
+ "玩": 671,
820
+ "珂": 672,
821
+ "珙": 673,
822
+ "球": 674,
823
+ "理": 675,
824
+ "琦": 676,
825
+ "琪": 677,
826
+ "瓊": 678,
827
+ "生": 679,
828
+ "田": 680,
829
+ "畢": 681,
830
+ "番": 682,
831
+ "瘡": 683,
832
+ "白": 684,
833
+ "皮": 685,
834
+ "真": 686,
835
+ "砲": 687,
836
+ "礮": 688,
837
+ "祈": 689,
838
+ "神": 690,
839
+ "祠": 691,
840
+ "秋": 692,
841
+ "空": 693,
842
+ "立": 694,
843
+ "精": 695,
844
+ "約": 696,
845
+ "絵": 697,
846
+ "織": 698,
847
+ "義": 699,
848
+ "翠": 700,
849
+ "者": 701,
850
+ "耕": 702,
851
+ "肖": 703,
852
+ "胡": 704,
853
+ "膀": 705,
854
+ "臂": 706,
855
+ "興": 707,
856
+ "良": 708,
857
+ "花": 709,
858
+ "芳": 710,
859
+ "芽": 711,
860
+ "若": 712,
861
+ "英": 713,
862
+ "藕": 714,
863
+ "藥": 715,
864
+ "蘄": 716,
865
+ "蘇": 717,
866
+ "行": 718,
867
+ "裁": 719,
868
+ "規": 720,
869
+ "覺": 721,
870
+ "观": 722,
871
+ "解": 723,
872
+ "記": 724,
873
+ "誓": 725,
874
+ "誡": 726,
875
+ "誰": 727,
876
+ "謎": 728,
877
+ "许": 729,
878
+ "谭": 730,
879
+ "豪": 731,
880
+ "豫": 732,
881
+ "費": 733,
882
+ "贵": 734,
883
+ "赤": 735,
884
+ "趙": 736,
885
+ "足": 737,
886
+ "跡": 738,
887
+ "転": 739,
888
+ "辛": 740,
889
+ "逆": 741,
890
+ "遇": 742,
891
+ "運": 743,
892
+ "過": 744,
893
+ "遠": 745,
894
+ "選": 746,
895
+ "邦": 747,
896
+ "邱": 748,
897
+ "部": 749,
898
+ "郭": 750,
899
+ "都": 751,
900
+ "酈": 752,
901
+ "里": 753,
902
+ "野": 754,
903
+ "金": 755,
904
+ "銃": 756,
905
+ "鋼": 757,
906
+ "錄": 758,
907
+ "錡": 759,
908
+ "鍵": 760,
909
+ "鐵": 761,
910
+ "钱": 762,
911
+ "铁": 763,
912
+ "關": 764,
913
+ "防": 765,
914
+ "阿": 766,
915
+ "陈": 767,
916
+ "陳": 768,
917
+ "陽": 769,
918
+ "隊": 770,
919
+ "階": 771,
920
+ "集": 772,
921
+ "雪": 773,
922
+ "雲": 774,
923
+ "霖": 775,
924
+ "霹": 776,
925
+ "靂": 777,
926
+ "韓": 778,
927
+ "願": 779,
928
+ "顯": 780,
929
+ "颜": 781,
930
+ "马": 782,
931
+ "高": 783,
932
+ "龍": 784,
933
+ "ﷲ": 785,
934
+ "ﻋ": 786,
935
+ "/": 787,
936
+ "3": 788,
937
+ "~": 789,
938
+ "##i": 790,
939
+ "##y": 791,
940
+ "##o": 792,
941
+ "##r": 793,
942
+ "##g": 794,
943
+ "##a": 795,
944
+ "##w": 796,
945
+ "##l": 797,
946
+ "##b": 798,
947
+ "##z": 799,
948
+ "##t": 800,
949
+ "##n": 801,
950
+ "##c": 802,
951
+ "##h": 803,
952
+ "##s": 804,
953
+ "##u": 805,
954
+ "##d": 806,
955
+ "##e": 807,
956
+ "##k": 808,
957
+ "##v": 809,
958
+ "##f": 810,
959
+ "##x": 811,
960
+ "##q": 812,
961
+ "##p": 813,
962
+ "##æ": 814,
963
+ "##0": 815,
964
+ "##5": 816,
965
+ "##m": 817,
966
+ "##8": 818,
967
+ "##4": 819,
968
+ "##س": 820,
969
+ "##ت": 821,
970
+ "##ا": 822,
971
+ "##ن": 823,
972
+ "##6": 824,
973
+ "##1": 825,
974
+ "##7": 826,
975
+ "##j": 827,
976
+ "##つ": 828,
977
+ "##う": 829,
978
+ "##2": 830,
979
+ "##9": 831,
980
+ "##3": 832,
981
+ "##ø": 833,
982
+ "##ล": 834,
983
+ "##ว": 835,
984
+ "##ง": 836,
985
+ "##พ": 837,
986
+ "##ไ": 838,
987
+ "##ช": 839,
988
+ "##ย": 840,
989
+ "##า": 841,
990
+ "##ร": 842,
991
+ "##თ": 843,
992
+ "##ა": 844,
993
+ "##ვ": 845,
994
+ "##რ": 846,
995
+ "##ი": 847,
996
+ "##ള": 848,
997
+ "##あ": 849,
998
+ "##ん": 850,
999
+ "##α": 851,
1000
+ "##ν": 852,
1001
+ "##τ": 853,
1002
+ "##ο": 854,
1003
+ "##κ": 855,
1004
+ "##ρ": 856,
1005
+ "##ω": 857,
1006
+ "##ς": 858,
1007
+ "##の": 859,
1008
+ "##な": 860,
1009
+ "##ら": 861,
1010
+ "##ð": 862,
1011
+ "##œ": 863,
1012
+ "##ɛ": 864,
1013
+ "##ł": 865,
1014
+ "##η": 866,
1015
+ "##μ": 867,
1016
+ "##ซ": 868,
1017
+ "##ル": 869,
1018
+ "##シ": 870,
1019
+ "##ア": 871,
1020
+ "##リ": 872,
1021
+ "##ス": 873,
1022
+ "##ʔ": 874,
1023
+ "##ल": 875,
1024
+ "##ᄇ": 876,
1025
+ "##ᅮ": 877,
1026
+ "##ᄃ": 878,
1027
+ "##ᅢ": 879,
1028
+ "##β": 880,
1029
+ "##ß": 881,
1030
+ "##か": 882,
1031
+ "##た": 883,
1032
+ "##ə": 884,
1033
+ "##ʻ": 885,
1034
+ "##ι": 886,
1035
+ "##χ": 887,
1036
+ "##о": 888,
1037
+ "##л": 889,
1038
+ "##с": 890,
1039
+ "##а": 891,
1040
+ "##т": 892,
1041
+ "##ы": 893,
1042
+ "##и": 894,
1043
+ "##в": 895,
1044
+ "##к": 896,
1045
+ "##з": 897,
1046
+ "##ッ": 898,
1047
+ "##ク": 899,
1048
+ "##マ": 900,
1049
+ "##ン": 901,
1050
+ "##გ": 902,
1051
+ "##ლ": 903,
1052
+ "##ო": 904,
1053
+ "##ნ": 905,
1054
+ "##ː": 906,
1055
+ "##ל": 907,
1056
+ "##ה": 908,
1057
+ "##א": 909,
1058
+ "##く": 910,
1059
+ "##み": 911,
1060
+ "##ε": 912,
1061
+ "##ξ": 913,
1062
+ "##ল": 914,
1063
+ "##ˈ": 915,
1064
+ "##ɡ": 916,
1065
+ "##ɑ": 917,
1066
+ "##ɒ": 918,
1067
+ "##し": 919,
1068
+ "##す": 920,
1069
+ "##き": 921,
1070
+ "##ひ": 922,
1071
+ "##と": 923,
1072
+ "##đ": 924,
1073
+ "##ъ": 925,
1074
+ "##н": 926,
1075
+ "##е": 927,
1076
+ "##י": 928,
1077
+ "##פ": 929,
1078
+ "##イ": 930,
1079
+ "##λ": 931,
1080
+ "##ق": 932,
1081
+ "##ع": 933,
1082
+ "##د": 934,
1083
+ "##ᅡ": 935,
1084
+ "##ᆯ": 936,
1085
+ "##ᄅ": 937,
1086
+ "##ɪ": 938,
1087
+ "##ค": 939,
1088
+ "##ต": 940,
1089
+ "##व": 941,
1090
+ "##��": 942,
1091
+ "##द": 943,
1092
+ "##は": 944,
1093
+ "##り": 945,
1094
+ "##レ": 946,
1095
+ "##ー": 947,
1096
+ "##ツ": 948,
1097
+ "##ي": 949,
1098
+ "##ش": 950,
1099
+ "##و": 951,
1100
+ "##م": 952,
1101
+ "##º": 953,
1102
+ "##ਲ": 954,
1103
+ "##ਾ": 955,
1104
+ "##ਹ": 956,
1105
+ "##д": 957,
1106
+ "##р": 958,
1107
+ "##ل": 959,
1108
+ "##ب": 960,
1109
+ "##い": 961,
1110
+ "##ち": 962,
1111
+ "##ゃ": 963,
1112
+ "##ʒ": 964,
1113
+ "##ʃ": 965,
1114
+ "##ɔ": 966,
1115
+ "##ह": 967,
1116
+ "##ニ": 968,
1117
+ "##ウ": 969,
1118
+ "##ァ": 970,
1119
+ "##キ": 971,
1120
+ "##ュ": 972,
1121
+ "##3": 973,
1122
+ "##ხ": 974,
1123
+ "##ს": 975,
1124
+ "##お": 976,
1125
+ "##タ": 977,
1126
+ "##ാ": 978,
1127
+ "##ഹ": 979,
1128
+ "##ɳ": 980,
1129
+ "##ま": 981,
1130
+ "##る": 982,
1131
+ "##ะ": 983,
1132
+ "##อ": 984,
1133
+ "##น": 985,
1134
+ "##ן": 986,
1135
+ "##я": 987,
1136
+ "##แ": 988,
1137
+ "##ก": 989,
1138
+ "##ɾ": 990,
1139
+ "##ʲ": 991,
1140
+ "##フ": 992,
1141
+ "##უ": 993,
1142
+ "##ภ": 994,
1143
+ "##ด": 995,
1144
+ "##ב": 996,
1145
+ "##ת": 997,
1146
+ "##خ": 998,
1147
+ "##ラ": 999,
1148
+ "##れ": 1000,
1149
+ "##ण": 1001,
1150
+ "##स": 1002,
1151
+ "##न": 1003,
1152
+ "##ه": 1004,
1153
+ "##ف": 1005,
1154
+ "##ر": 1006,
1155
+ "##エ": 1007,
1156
+ "##テ": 1008,
1157
+ "##ษ": 1009,
1158
+ "##ฐ": 1010,
1159
+ "##ィ": 1011,
1160
+ "##क": 1012,
1161
+ "##ノ": 1013,
1162
+ "##θ": 1014,
1163
+ "##ネ": 1015,
1164
+ "##ョ": 1016,
1165
+ "##δ": 1017,
1166
+ "##ɽ": 1018,
1167
+ "##ʁ": 1019,
1168
+ "##ტ": 1020,
1169
+ "##ჱ": 1021,
1170
+ "##ェ": 1022,
1171
+ "##ハ": 1023,
1172
+ "##υ": 1024,
1173
+ "##र": 1025,
1174
+ "##х": 1026,
1175
+ "##も": 1027,
1176
+ "##っ": 1028,
1177
+ "##ょ": 1029,
1178
+ "##に": 1030,
1179
+ "##γ": 1031,
1180
+ "##ც": 1032,
1181
+ "##ე": 1033,
1182
+ "##є": 1034,
1183
+ "##м": 1035,
1184
+ "##ܕ": 1036,
1185
+ "##ܝ": 1037,
1186
+ "##ܢ": 1038,
1187
+ "##ܬ": 1039,
1188
+ "##ณ": 1040,
1189
+ "##ม": 1041,
1190
+ "##ฮ": 1042,
1191
+ "##ж": 1043,
1192
+ "##ם": 1044,
1193
+ "##ء": 1045,
1194
+ "##ʊ": 1046,
1195
+ "##ई": 1047,
1196
+ "##め": 1048,
1197
+ "##მ": 1049,
1198
+ "##ム": 1050,
1199
+ "##チ": 1051,
1200
+ "##ᵻ": 1052,
1201
+ "##ˌ": 1053,
1202
+ "##ו": 1054,
1203
+ "##ף": 1055,
1204
+ "##წ": 1056,
1205
+ "##ფ": 1057,
1206
+ "##ャ": 1058,
1207
+ "##モ": 1059,
1208
+ "##ɐ": 1060,
1209
+ "##ᅦ": 1061,
1210
+ "##ᅩ": 1062,
1211
+ "##ᆨ": 1063,
1212
+ "##ᅵ": 1064,
1213
+ "##ᆸ": 1065,
1214
+ "##ᅧ": 1066,
1215
+ "##ᆼ": 1067,
1216
+ "##ᄋ": 1068,
1217
+ "##ᆫ": 1069,
1218
+ "##わ": 1070,
1219
+ "##ı": 1071,
1220
+ "##ქ": 1072,
1221
+ "##დ": 1073,
1222
+ "##ि": 1074,
1223
+ "##ჲ": 1075,
1224
+ "##ר": 1076,
1225
+ "##セ": 1077,
1226
+ "##オ": 1078,
1227
+ "##ゆ": 1079,
1228
+ "##せ": 1080,
1229
+ "##ك": 1081,
1230
+ "##ʿ": 1082,
1231
+ "##ש": 1083,
1232
+ "##מ": 1084,
1233
+ "##צ": 1085,
1234
+ "##п": 1086,
1235
+ "##г": 1087,
1236
+ "##カ": 1088,
1237
+ "##ܠ": 1089,
1238
+ "##ܗ": 1090,
1239
+ "##ܐ": 1091,
1240
+ "##ナ": 1092,
1241
+ "##ミ": 1093,
1242
+ "##こ": 1094,
1243
+ "##を": 1095,
1244
+ "##ψ": 1096,
1245
+ "##サ": 1097,
1246
+ "##ォ": 1098,
1247
+ "##π": 1099,
1248
+ "##ト": 1100,
1249
+ "##у": 1101,
1250
+ "##ح": 1102,
1251
+ "##σ": 1103,
1252
+ "##เ": 1104,
1253
+ "##ป": 1105,
1254
+ "##ш": 1106,
1255
+ "##ゥ": 1107,
1256
+ "##ロ": 1108,
1257
+ "##া": 1109,
1258
+ "##হ": 1110,
1259
+ "##ɜ": 1111,
1260
+ "##ة": 1112,
1261
+ "##ص": 1113,
1262
+ "##ס": 1114,
1263
+ "##ث": 1115,
1264
+ "##ჳ": 1116,
1265
+ "##נ": 1117,
1266
+ "##ذ": 1118,
1267
+ "##ग": 1119,
1268
+ "##ɫ": 1120,
1269
+ "##ц": 1121,
1270
+ "##ь": 1122,
1271
+ "##ю": 1123
1272
+ }
1273
+ }
1274
+ }
tokenizer_config.json ADDED
@@ -0,0 +1,16 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cls_token": "[CLS]",
3
+ "do_basic_tokenize": true,
4
+ "do_lower_case": true,
5
+ "mask_token": "[MASK]",
6
+ "model_max_length": 512,
7
+ "name_or_path": "temp/dummy/bert/processors",
8
+ "never_split": null,
9
+ "pad_token": "[PAD]",
10
+ "sep_token": "[SEP]",
11
+ "special_tokens_map_file": null,
12
+ "strip_accents": null,
13
+ "tokenize_chinese_chars": true,
14
+ "tokenizer_class": "BertTokenizer",
15
+ "unk_token": "[UNK]"
16
+ }
vocab.txt ADDED
@@ -0,0 +1,1124 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [PAD]
2
+ [UNK]
3
+ [CLS]
4
+ [SEP]
5
+ [MASK]
6
+ !
7
+ "
8
+ #
9
+ $
10
+ %
11
+ &
12
+ '
13
+ (
14
+ )
15
+ *
16
+ +
17
+ ,
18
+ -
19
+ .
20
+ /
21
+ 0
22
+ 1
23
+ 2
24
+ 3
25
+ 4
26
+ 5
27
+ 6
28
+ 7
29
+ 8
30
+ 9
31
+ :
32
+ ;
33
+ <
34
+ =
35
+ >
36
+ ?
37
+ @
38
+ [
39
+ \
40
+ ]
41
+ ^
42
+ _
43
+ `
44
+ a
45
+ b
46
+ c
47
+ d
48
+ e
49
+ f
50
+ g
51
+ h
52
+ i
53
+ j
54
+ k
55
+ l
56
+ m
57
+ n
58
+ o
59
+ p
60
+ q
61
+ r
62
+ s
63
+ t
64
+ u
65
+ v
66
+ w
67
+ x
68
+ y
69
+ z
70
+ |
71
+ }
72
+ ~
73
+ ¡
74
+ ¢
75
+ £
76
+ ¥
77
+ §
78
+ °
79
+ ±
80
+ ²
81
+ ³
82
+ ´
83
+ µ
84
+ ·
85
+ º
86
+ ½
87
+ ¿
88
+ ×
89
+ ß
90
+ æ
91
+ ð
92
+ ø
93
+ þ
94
+ đ
95
+ ħ
96
+ ı
97
+ ł
98
+ œ
99
+ ɐ
100
+ ɑ
101
+ ɒ
102
+ ɔ
103
+ ə
104
+ ɛ
105
+ ɜ
106
+ ɡ
107
+ ɢ
108
+ ɪ
109
+ ɫ
110
+ ɳ
111
+ ɽ
112
+ ɾ
113
+ ʁ
114
+ ʃ
115
+ ʊ
116
+ ʋ
117
+ ʒ
118
+ ʔ
119
+ ʕ
120
+ ʲ
121
+ ʻ
122
+ ʼ
123
+ ʾ
124
+ ʿ
125
+ ˈ
126
+ ˌ
127
+ ː
128
+ α
129
+ β
130
+ γ
131
+ δ
132
+ ε
133
+ η
134
+ θ
135
+ ι
136
+ κ
137
+ λ
138
+ μ
139
+ ν
140
+ ξ
141
+ ο
142
+ π
143
+ ρ
144
+ ς
145
+ σ
146
+ τ
147
+ υ
148
+ φ
149
+ χ
150
+ ψ
151
+ ω
152
+ а
153
+ б
154
+ в
155
+ г
156
+ д
157
+ е
158
+ ж
159
+ з
160
+ и
161
+ к
162
+ л
163
+ м
164
+ н
165
+ о
166
+ п
167
+ р
168
+ с
169
+ т
170
+ у
171
+ х
172
+ ц
173
+ ш
174
+ ъ
175
+ ы
176
+ ь
177
+ ю
178
+ я
179
+ є
180
+ א
181
+ ב
182
+ ג
183
+ ה
184
+ ו
185
+ ז
186
+ ח
187
+ י
188
+ ל
189
+ ם
190
+ מ
191
+ ן
192
+ נ
193
+ ס
194
+ ף
195
+ פ
196
+ צ
197
+ ר
198
+ ש
199
+ ת
200
+ ء
201
+ ا
202
+ ب
203
+ ة
204
+ ت
205
+ ث
206
+ ج
207
+ ح
208
+ خ
209
+ د
210
+ ذ
211
+ ر
212
+ س
213
+ ش
214
+ ص
215
+ ع
216
+ ف
217
+ ق
218
+ ك
219
+ ل
220
+ م
221
+ ن
222
+ ه
223
+ و
224
+ ي
225
+ ܐ
226
+ ܕ
227
+ ܗ
228
+ ܝ
229
+ ܠ
230
+ ܢ
231
+ ܬ
232
+
233
+
234
+
235
+
236
+
237
+
238
+
239
+
240
+
241
+
242
+
243
+
244
+
245
+
246
+
247
+
248
+
249
+
250
+ ि
251
+
252
+
253
+
254
+
255
+
256
+
257
+
258
+
259
+
260
+
261
+
262
+
263
+
264
+
265
+
266
+
267
+
268
+
269
+
270
+
271
+
272
+
273
+
274
+
275
+
276
+
277
+
278
+
279
+
280
+
281
+
282
+
283
+
284
+
285
+
286
+
287
+
288
+
289
+
290
+
291
+
292
+
293
+
294
+
295
+
296
+
297
+
298
+
299
+
300
+
301
+
302
+
303
+
304
+
305
+
306
+
307
+
308
+
309
+
310
+
311
+
312
+
313
+
314
+
315
+
316
+
317
+
318
+
319
+
320
+
321
+
322
+
323
+
324
+
325
+
326
+
327
+
328
+
329
+
330
+
331
+
332
+
333
+
334
+
335
+
336
+
337
+
338
+
339
+
340
+
341
+
342
+
343
+
344
+
345
+
346
+
347
+
348
+
349
+
350
+
351
+
352
+
353
+
354
+
355
+
356
+
357
+
358
+
359
+
360
+
361
+
362
+
363
+
364
+
365
+
366
+
367
+
368
+
369
+
370
+
371
+
372
+
373
+
374
+
375
+
376
+
377
+
378
+
379
+
380
+
381
+
382
+
383
+
384
+
385
+
386
+
387
+
388
+
389
+
390
+
391
+
392
+
393
+
394
+
395
+
396
+
397
+
398
+
399
+
400
+
401
+
402
+
403
+
404
+
405
+
406
+
407
+
408
+
409
+
410
+
411
+
412
+
413
+
414
+
415
+
416
+
417
+
418
+
419
+
420
+
421
+
422
+
423
+
424
+
425
+
426
+
427
+
428
+
429
+
430
+
431
+
432
+
433
+
434
+
435
+
436
+
437
+
438
+
439
+
440
+
441
+
442
+
443
+
444
+
445
+
446
+
447
+
448
+
449
+
450
+
451
+
452
+
453
+
454
+
455
+
456
+
457
+
458
+
459
+
460
+
461
+
462
+
463
+
464
+
465
+
466
+
467
+
468
+
469
+
470
+
471
+
472
+
473
+
474
+
475
+
476
+
477
+
478
+
479
+
480
+
481
+
482
+
483
+
484
+
485
+
486
+
487
+
488
+
489
+
490
+
491
+
492
+
493
+
494
+
495
+
496
+
497
+
498
+
499
+
500
+
501
+
502
+
503
+
504
+
505
+
506
+
507
+
508
+
509
+
510
+
511
+
512
+
513
+
514
+
515
+
516
+
517
+
518
+
519
+
520
+
521
+
522
+
523
+
524
+
525
+
526
+
527
+
528
+
529
+
530
+
531
+
532
+
533
+
534
+
535
+
536
+
537
+
538
+
539
+
540
+
541
+
542
+
543
+
544
+
545
+
546
+
547
+
548
+
549
+
550
+
551
+
552
+
553
+
554
+
555
+
556
+
557
+
558
+
559
+
560
+
561
+
562
+
563
+
564
+
565
+
566
+
567
+
568
+
569
+
570
+
571
+
572
+
573
+
574
+
575
+
576
+
577
+
578
+
579
+ 广
580
+
581
+
582
+
583
+
584
+
585
+
586
+
587
+
588
+
589
+
590
+
591
+
592
+
593
+
594
+
595
+
596
+
597
+
598
+
599
+
600
+
601
+
602
+
603
+
604
+
605
+
606
+
607
+
608
+
609
+
610
+
611
+
612
+
613
+
614
+
615
+
616
+
617
+
618
+
619
+
620
+
621
+
622
+
623
+
624
+
625
+
626
+
627
+
628
+
629
+
630
+
631
+
632
+
633
+
634
+
635
+
636
+
637
+
638
+
639
+
640
+
641
+
642
+
643
+
644
+ 殿
645
+
646
+
647
+
648
+
649
+
650
+
651
+
652
+
653
+
654
+
655
+
656
+
657
+
658
+
659
+
660
+
661
+
662
+
663
+
664
+
665
+
666
+
667
+
668
+
669
+
670
+
671
+
672
+
673
+
674
+
675
+
676
+
677
+
678
+
679
+
680
+
681
+
682
+
683
+
684
+
685
+
686
+
687
+
688
+
689
+
690
+
691
+
692
+
693
+
694
+
695
+
696
+
697
+
698
+
699
+
700
+
701
+
702
+
703
+
704
+
705
+
706
+
707
+
708
+
709
+
710
+
711
+
712
+
713
+
714
+
715
+
716
+
717
+
718
+
719
+
720
+
721
+
722
+
723
+
724
+
725
+
726
+
727
+
728
+
729
+
730
+
731
+
732
+
733
+
734
+
735
+
736
+
737
+
738
+
739
+
740
+
741
+
742
+
743
+
744
+
745
+
746
+
747
+
748
+
749
+
750
+
751
+
752
+
753
+
754
+
755
+
756
+
757
+
758
+
759
+
760
+
761
+
762
+
763
+
764
+
765
+
766
+
767
+
768
+
769
+
770
+
771
+
772
+
773
+
774
+
775
+
776
+
777
+
778
+
779
+
780
+
781
+
782
+
783
+
784
+
785
+
786
+
787
+
788
+
789
+
790
+
791
+ ##i
792
+ ##y
793
+ ##o
794
+ ##r
795
+ ##g
796
+ ##a
797
+ ##w
798
+ ##l
799
+ ##b
800
+ ##z
801
+ ##t
802
+ ##n
803
+ ##c
804
+ ##h
805
+ ##s
806
+ ##u
807
+ ##d
808
+ ##e
809
+ ##k
810
+ ##v
811
+ ##f
812
+ ##x
813
+ ##q
814
+ ##p
815
+ ##æ
816
+ ##0
817
+ ##5
818
+ ##m
819
+ ##8
820
+ ##4
821
+ ##س
822
+ ##ت
823
+ ##ا
824
+ ##ن
825
+ ##6
826
+ ##1
827
+ ##7
828
+ ##j
829
+ ##つ
830
+ ##う
831
+ ##2
832
+ ##9
833
+ ##3
834
+ ##ø
835
+ ##ล
836
+ ##ว
837
+ ##ง
838
+ ##พ
839
+ ##ไ
840
+ ##ช
841
+ ##ย
842
+ ##า
843
+ ##ร
844
+ ##თ
845
+ ##ა
846
+ ##ვ
847
+ ##რ
848
+ ##ი
849
+ ##ള
850
+ ##あ
851
+ ##ん
852
+ ##α
853
+ ##ν
854
+ ##τ
855
+ ##ο
856
+ ##κ
857
+ ##ρ
858
+ ##ω
859
+ ##ς
860
+ ##の
861
+ ##な
862
+ ##ら
863
+ ##ð
864
+ ##œ
865
+ ##ɛ
866
+ ##ł
867
+ ##η
868
+ ##μ
869
+ ##ซ
870
+ ##ル
871
+ ##シ
872
+ ##ア
873
+ ##リ
874
+ ##ス
875
+ ##ʔ
876
+ ##ल
877
+ ##ᄇ
878
+ ##ᅮ
879
+ ##ᄃ
880
+ ##ᅢ
881
+ ##β
882
+ ##ß
883
+ ##か
884
+ ##た
885
+ ##ə
886
+ ##ʻ
887
+ ##ι
888
+ ##χ
889
+ ##о
890
+ ##л
891
+ ##с
892
+ ##а
893
+ ##т
894
+ ##ы
895
+ ##и
896
+ ##в
897
+ ##к
898
+ ##з
899
+ ##ッ
900
+ ##ク
901
+ ##マ
902
+ ##ン
903
+ ##გ
904
+ ##ლ
905
+ ##ო
906
+ ##ნ
907
+ ##ː
908
+ ##ל
909
+ ##ה
910
+ ##א
911
+ ##く
912
+ ##み
913
+ ##ε
914
+ ##ξ
915
+ ##ল
916
+ ##ˈ
917
+ ##ɡ
918
+ ##ɑ
919
+ ##ɒ
920
+ ##し
921
+ ##す
922
+ ##き
923
+ ##ひ
924
+ ##と
925
+ ##đ
926
+ ##ъ
927
+ ##н
928
+ ##е
929
+ ##י
930
+ ##פ
931
+ ##イ
932
+ ##λ
933
+ ##ق
934
+ ##ع
935
+ ##د
936
+ ##ᅡ
937
+ ##ᆯ
938
+ ##ᄅ
939
+ ##ɪ
940
+ ##ค
941
+ ##ต
942
+ ##व
943
+ ##ा
944
+ ##द
945
+ ##は
946
+ ##り
947
+ ##レ
948
+ ##ー
949
+ ##ツ
950
+ ##ي
951
+ ##ش
952
+ ##و
953
+ ##م
954
+ ##º
955
+ ##ਲ
956
+ ##ਾ
957
+ ##ਹ
958
+ ##д
959
+ ##р
960
+ ##ل
961
+ ##ب
962
+ ##い
963
+ ##ち
964
+ ##ゃ
965
+ ##ʒ
966
+ ##ʃ
967
+ ##ɔ
968
+ ##ह
969
+ ##ニ
970
+ ##ウ
971
+ ##ァ
972
+ ##キ
973
+ ##ュ
974
+ ##3
975
+ ##ხ
976
+ ##ს
977
+ ##お
978
+ ##タ
979
+ ##ാ
980
+ ##ഹ
981
+ ##ɳ
982
+ ##ま
983
+ ##る
984
+ ##ะ
985
+ ##อ
986
+ ##น
987
+ ##ן
988
+ ##я
989
+ ##แ
990
+ ##ก
991
+ ##ɾ
992
+ ##ʲ
993
+ ##フ
994
+ ##უ
995
+ ##ภ
996
+ ##ด
997
+ ##ב
998
+ ##ת
999
+ ##خ
1000
+ ##ラ
1001
+ ##れ
1002
+ ##ण
1003
+ ##स
1004
+ ##न
1005
+ ##ه
1006
+ ##ف
1007
+ ##ر
1008
+ ##エ
1009
+ ##テ
1010
+ ##ษ
1011
+ ##ฐ
1012
+ ##ィ
1013
+ ##क
1014
+ ##ノ
1015
+ ##θ
1016
+ ##ネ
1017
+ ##��
1018
+ ##δ
1019
+ ##ɽ
1020
+ ##ʁ
1021
+ ##ტ
1022
+ ##ჱ
1023
+ ##ェ
1024
+ ##ハ
1025
+ ##υ
1026
+ ##र
1027
+ ##х
1028
+ ##も
1029
+ ##っ
1030
+ ##ょ
1031
+ ##に
1032
+ ##γ
1033
+ ##ც
1034
+ ##ე
1035
+ ##є
1036
+ ##м
1037
+ ##ܕ
1038
+ ##ܝ
1039
+ ##ܢ
1040
+ ##ܬ
1041
+ ##ณ
1042
+ ##ม
1043
+ ##ฮ
1044
+ ##ж
1045
+ ##ם
1046
+ ##ء
1047
+ ##ʊ
1048
+ ##ई
1049
+ ##め
1050
+ ##მ
1051
+ ##ム
1052
+ ##チ
1053
+ ##ᵻ
1054
+ ##ˌ
1055
+ ##ו
1056
+ ##ף
1057
+ ##წ
1058
+ ##ფ
1059
+ ##ャ
1060
+ ##モ
1061
+ ##ɐ
1062
+ ##ᅦ
1063
+ ##ᅩ
1064
+ ##ᆨ
1065
+ ##ᅵ
1066
+ ##ᆸ
1067
+ ##ᅧ
1068
+ ##ᆼ
1069
+ ##ᄋ
1070
+ ##ᆫ
1071
+ ##わ
1072
+ ##ı
1073
+ ##ქ
1074
+ ##დ
1075
+ ##ि
1076
+ ##ჲ
1077
+ ##ר
1078
+ ##セ
1079
+ ##オ
1080
+ ##ゆ
1081
+ ##せ
1082
+ ##ك
1083
+ ##ʿ
1084
+ ##ש
1085
+ ##מ
1086
+ ##צ
1087
+ ##п
1088
+ ##г
1089
+ ##カ
1090
+ ##ܠ
1091
+ ##ܗ
1092
+ ##ܐ
1093
+ ##ナ
1094
+ ##ミ
1095
+ ##こ
1096
+ ##を
1097
+ ##ψ
1098
+ ##サ
1099
+ ##ォ
1100
+ ##π
1101
+ ##ト
1102
+ ##у
1103
+ ##ح
1104
+ ##σ
1105
+ ##เ
1106
+ ##ป
1107
+ ##ш
1108
+ ##ゥ
1109
+ ##ロ
1110
+ ##া
1111
+ ##হ
1112
+ ##ɜ
1113
+ ##ة
1114
+ ##ص
1115
+ ##ס
1116
+ ##ث
1117
+ ##ჳ
1118
+ ##נ
1119
+ ##ذ
1120
+ ##ग
1121
+ ##ɫ
1122
+ ##ц
1123
+ ##ь
1124
+ ##ю