indiejoseph commited on
Commit
8d1b812
1 Parent(s): ea0810d

Add new SentenceTransformer model.

Browse files
.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ pytorch_model.bin filter=lfs diff=lfs merge=lfs -text
1_Pooling/config.json ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "word_embedding_dimension": 768,
3
+ "pooling_mode_cls_token": false,
4
+ "pooling_mode_mean_tokens": true,
5
+ "pooling_mode_max_tokens": false,
6
+ "pooling_mode_mean_sqrt_len_tokens": false,
7
+ "pooling_mode_weightedmean_tokens": false,
8
+ "pooling_mode_lasttoken": false
9
+ }
README.md ADDED
@@ -0,0 +1,94 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ pipeline_tag: sentence-similarity
3
+ tags:
4
+ - sentence-transformers
5
+ - feature-extraction
6
+ - sentence-similarity
7
+ - transformers
8
+
9
+ ---
10
+
11
+ # indiejoseph/bert-cantonese-sts
12
+
13
+ This is a [sentence-transformers](https://www.SBERT.net) model: It maps sentences & paragraphs to a 768 dimensional dense vector space and can be used for tasks like clustering or semantic search.
14
+
15
+ <!--- Describe your model here -->
16
+
17
+ ## Usage (Sentence-Transformers)
18
+
19
+ Using this model becomes easy when you have [sentence-transformers](https://www.SBERT.net) installed:
20
+
21
+ ```
22
+ pip install -U sentence-transformers
23
+ ```
24
+
25
+ Then you can use the model like this:
26
+
27
+ ```python
28
+ from sentence_transformers import SentenceTransformer
29
+ sentences = ["This is an example sentence", "Each sentence is converted"]
30
+
31
+ model = SentenceTransformer('indiejoseph/bert-cantonese-sts')
32
+ embeddings = model.encode(sentences)
33
+ print(embeddings)
34
+ ```
35
+
36
+
37
+
38
+ ## Usage (HuggingFace Transformers)
39
+ Without [sentence-transformers](https://www.SBERT.net), you can use the model like this: First, you pass your input through the transformer model, then you have to apply the right pooling-operation on-top of the contextualized word embeddings.
40
+
41
+ ```python
42
+ from transformers import AutoTokenizer, AutoModel
43
+ import torch
44
+
45
+
46
+ #Mean Pooling - Take attention mask into account for correct averaging
47
+ def mean_pooling(model_output, attention_mask):
48
+ token_embeddings = model_output[0] #First element of model_output contains all token embeddings
49
+ input_mask_expanded = attention_mask.unsqueeze(-1).expand(token_embeddings.size()).float()
50
+ return torch.sum(token_embeddings * input_mask_expanded, 1) / torch.clamp(input_mask_expanded.sum(1), min=1e-9)
51
+
52
+
53
+ # Sentences we want sentence embeddings for
54
+ sentences = ['This is an example sentence', 'Each sentence is converted']
55
+
56
+ # Load model from HuggingFace Hub
57
+ tokenizer = AutoTokenizer.from_pretrained('indiejoseph/bert-cantonese-sts')
58
+ model = AutoModel.from_pretrained('indiejoseph/bert-cantonese-sts')
59
+
60
+ # Tokenize sentences
61
+ encoded_input = tokenizer(sentences, padding=True, truncation=True, return_tensors='pt')
62
+
63
+ # Compute token embeddings
64
+ with torch.no_grad():
65
+ model_output = model(**encoded_input)
66
+
67
+ # Perform pooling. In this case, mean pooling.
68
+ sentence_embeddings = mean_pooling(model_output, encoded_input['attention_mask'])
69
+
70
+ print("Sentence embeddings:")
71
+ print(sentence_embeddings)
72
+ ```
73
+
74
+
75
+
76
+ ## Evaluation Results
77
+
78
+ <!--- Describe how your model was evaluated -->
79
+
80
+ For an automated evaluation of this model, see the *Sentence Embeddings Benchmark*: [https://seb.sbert.net](https://seb.sbert.net?model_name=indiejoseph/bert-cantonese-sts)
81
+
82
+
83
+
84
+ ## Full Model Architecture
85
+ ```
86
+ SentenceTransformer(
87
+ (0): Transformer({'max_seq_length': 512, 'do_lower_case': True}) with Transformer model: BertModel
88
+ (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False})
89
+ )
90
+ ```
91
+
92
+ ## Citing & Authors
93
+
94
+ <!--- Describe where people can find more information -->
added_tokens.json ADDED
@@ -0,0 +1,1467 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "㚻": 21615,
3
+ "㜺": 22407,
4
+ "㞗": 21701,
5
+ "㞘": 22096,
6
+ "㩒": 21141,
7
+ "㬎": 21853,
8
+ "㴓": 22011,
9
+ "㶞": 22576,
10
+ "㷫": 21621,
11
+ "㹢": 22294,
12
+ "㹴": 21573,
13
+ "㺢": 22293,
14
+ "㻋": 22034,
15
+ "㻗": 22485,
16
+ "䃟": 21402,
17
+ "䓪": 21509,
18
+ "䓬": 22380,
19
+ "䥑": 21314,
20
+ "䴉": 22131,
21
+ "䶄": 22591,
22
+ "䶮": 22228,
23
+ "丏": 22210,
24
+ "乭": 21894,
25
+ "乸": 21339,
26
+ "亶": 21411,
27
+ "亹": 22209,
28
+ "仡": 21161,
29
+ "仫": 22198,
30
+ "仵": 22446,
31
+ "伋": 21798,
32
+ "伷": 21932,
33
+ "佉": 22206,
34
+ "佮": 22468,
35
+ "佺": 22396,
36
+ "佾": 21841,
37
+ "侂": 21311,
38
+ "侔": 22023,
39
+ "侘": 22232,
40
+ "俁": 21842,
41
+ "俅": 22539,
42
+ "俣": 21190,
43
+ "俵": 21504,
44
+ "俶": 21661,
45
+ "倣": 21421,
46
+ "倧": 22346,
47
+ "倮": 21955,
48
+ "倻": 21972,
49
+ "偲": 21625,
50
+ "傈": 21942,
51
+ "傕": 21372,
52
+ "僆": 22181,
53
+ "僉": 21887,
54
+ "僊": 21703,
55
+ "僞": 21510,
56
+ "僰": 22441,
57
+ "僳": 21943,
58
+ "儁": 21375,
59
+ "儇": 21639,
60
+ "儔": 21967,
61
+ "儸": 21950,
62
+ "儺": 22508,
63
+ "兗": 21155,
64
+ "兪": 22321,
65
+ "冑": 21306,
66
+ "冚": 21249,
67
+ "冧": 21283,
68
+ "冫": 22410,
69
+ "冴": 21399,
70
+ "凖": 22443,
71
+ "凫": 21833,
72
+ "凵": 22229,
73
+ "凼": 22371,
74
+ "刋": 22394,
75
+ "剡": 22377,
76
+ "劄": 21333,
77
+ "劻": 22046,
78
+ "勅": 21939,
79
+ "勔": 22360,
80
+ "勗": 21581,
81
+ "勣": 21795,
82
+ "勰": 22078,
83
+ "勱": 22055,
84
+ "勲": 21865,
85
+ "勷": 21675,
86
+ "匂": 22185,
87
+ "匏": 21455,
88
+ "匚": 22419,
89
+ "匸": 22488,
90
+ "卌": 21148,
91
+ "卐": 22517,
92
+ "卬": 22285,
93
+ "卹": 21320,
94
+ "卽": 21292,
95
+ "厏": 22559,
96
+ "厓": 22151,
97
+ "厠": 22329,
98
+ "厰": 22383,
99
+ "叄": 22298,
100
+ "吔": 21215,
101
+ "吲": 21547,
102
+ "吿": 21202,
103
+ "呑": 21533,
104
+ "呔": 21901,
105
+ "呪": 21910,
106
+ "咇": 22004,
107
+ "哙": 22578,
108
+ "哚": 22213,
109
+ "唂": 21759,
110
+ "唞": 21514,
111
+ "唥": 21279,
112
+ "唨": 22094,
113
+ "唪": 21278,
114
+ "唻": 22170,
115
+ "啹": 22054,
116
+ "喐": 21599,
117
+ "喥": 21534,
118
+ "喦": 22255,
119
+ "喼": 21786,
120
+ "喾": 21692,
121
+ "嗉": 21130,
122
+ "嗌": 21128,
123
+ "嗩": 22496,
124
+ "嗮": 21144,
125
+ "嗱": 21216,
126
+ "嘏": 22147,
127
+ "嘥": 21218,
128
+ "噉": 21133,
129
+ "噍": 21529,
130
+ "噏": 21530,
131
+ "噚": 22063,
132
+ "噝": 22028,
133
+ "噣": 22589,
134
+ "噯": 21214,
135
+ "噲": 21964,
136
+ "嚈": 22033,
137
+ "嚙": 21528,
138
+ "嚜": 21355,
139
+ "嚡": 22572,
140
+ "嚤": 21169,
141
+ "嚦": 22424,
142
+ "嚫": 21600,
143
+ "嚳": 22006,
144
+ "嚿": 21245,
145
+ "囓": 21914,
146
+ "囘": 21686,
147
+ "圀": 22256,
148
+ "圉": 22240,
149
+ "圮": 22241,
150
+ "圯": 21927,
151
+ "圹": 22237,
152
+ "坌": 21989,
153
+ "坧": 22416,
154
+ "坩": 21731,
155
+ "坭": 21522,
156
+ "坻": 22191,
157
+ "垌": 21937,
158
+ "垓": 21271,
159
+ "垟": 22582,
160
+ "垧": 22465,
161
+ "垸": 21769,
162
+ "垻": 21928,
163
+ "埈": 22190,
164
+ "埐": 22376,
165
+ "埒": 21971,
166
+ "埜": 21921,
167
+ "埞": 21377,
168
+ "埡": 22352,
169
+ "埭": 21844,
170
+ "埲": 21835,
171
+ "埴": 21542,
172
+ "堉": 22068,
173
+ "堊": 21720,
174
+ "堝": 21732,
175
+ "堞": 21724,
176
+ "堠": 22402,
177
+ "塙": 21505,
178
+ "塡": 22124,
179
+ "塱": 21152,
180
+ "塲": 21714,
181
+ "塹": 21519,
182
+ "塽": 21926,
183
+ "墬": 22177,
184
+ "壎": 21765,
185
+ "壙": 21793,
186
+ "壠": 22276,
187
+ "壱": 21444,
188
+ "夂": 22509,
189
+ "夤": 22203,
190
+ "夨": 22359,
191
+ "夬": 22413,
192
+ "奀": 21913,
193
+ "奭": 21772,
194
+ "妫": 21941,
195
+ "姈": 22592,
196
+ "姮": 22274,
197
+ "姵": 22105,
198
+ "娸": 21851,
199
+ "媯": 22378,
200
+ "媺": 21407,
201
+ "嫗": 21503,
202
+ "嫪": 21396,
203
+ "嫺": 22164,
204
+ "嫿": 21291,
205
+ "嬈": 22277,
206
+ "嬋": 21936,
207
+ "嬲": 21211,
208
+ "孋": 22518,
209
+ "孭": 21532,
210
+ "孲": 21467,
211
+ "孻": 21452,
212
+ "宀": 21655,
213
+ "宍": 21898,
214
+ "寔": 21349,
215
+ "寳": 21734,
216
+ "尅": 21741,
217
+ "尐": 21217,
218
+ "尙": 21521,
219
+ "尢": 21242,
220
+ "尪": 22104,
221
+ "屘": 21883,
222
+ "屙": 21426,
223
+ "屭": 22471,
224
+ "屻": 22021,
225
+ "岃": 22533,
226
+ "峄": 22403,
227
+ "峠": 21134,
228
+ "崃": 22482,
229
+ "崟": 22356,
230
+ "崢": 21248,
231
+ "崤": 21984,
232
+ "嵗": 21713,
233
+ "嵴": 21722,
234
+ "嶗": 22433,
235
+ "嶝": 22411,
236
+ "嶠": 21536,
237
+ "嶷": 21969,
238
+ "嶸": 22340,
239
+ "巂": 22267,
240
+ "巋": 22490,
241
+ "帙": 22244,
242
+ "幗": 21592,
243
+ "幪": 21228,
244
+ "幷": 22058,
245
+ "庑": 21461,
246
+ "庥": 22372,
247
+ "庯": 22431,
248
+ "廄": 21172,
249
+ "廆": 21499,
250
+ "廍": 21993,
251
+ "廐": 22261,
252
+ "廙": 22150,
253
+ "廡": 21740,
254
+ "廨": 21325,
255
+ "廩": 21440,
256
+ "廪": 21264,
257
+ "廸": 21863,
258
+ "廻": 21327,
259
+ "廼": 22521,
260
+ "弇": 21412,
261
+ "弐": 22473,
262
+ "弢": 22350,
263
+ "彊": 21312,
264
+ "彔": 22444,
265
+ "彖": 21157,
266
+ "彘": 21255,
267
+ "彜": 22252,
268
+ "彡": 21448,
269
+ "彣": 22174,
270
+ "徂": 21408,
271
+ "徫": 21871,
272
+ "徭": 21872,
273
+ "忞": 21464,
274
+ "怛": 22219,
275
+ "恂": 22212,
276
+ "恊": 21775,
277
+ "恽": 21459,
278
+ "悝": 22342,
279
+ "惗": 21885,
280
+ "惣": 21403,
281
+ "惲": 21525,
282
+ "惻": 21329,
283
+ "愃": 22070,
284
+ "愎": 21506,
285
+ "愔": 21999,
286
+ "愨": 21379,
287
+ "愬": 21330,
288
+ "愴": 21821,
289
+ "慇": 22357,
290
+ "慜": 22553,
291
+ "慤": 21609,
292
+ "憓": 21420,
293
+ "憙": 22367,
294
+ "懌": 21884,
295
+ "懽": 22035,
296
+ "戇": 21524,
297
+ "戉": 21987,
298
+ "戔": 21603,
299
+ "戙": 21628,
300
+ "戢": 21137,
301
+ "戥": 21439,
302
+ "戩": 21864,
303
+ "戯": 21648,
304
+ "扞": 22000,
305
+ "扺": 22214,
306
+ "抌": 21537,
307
+ "抺": 21469,
308
+ "拃": 21288,
309
+ "拏": 21757,
310
+ "拶": 21651,
311
+ "挐": 21659,
312
+ "挿": 21447,
313
+ "捜": 21647,
314
+ "捫": 21165,
315
+ "捹": 21435,
316
+ "捽": 21540,
317
+ "掄": 22439,
318
+ "掕": 21225,
319
+ "掟": 21234,
320
+ "掹": 21381,
321
+ "掾": 21405,
322
+ "揈": 21869,
323
+ "揞": 21991,
324
+ "揼": 21416,
325
+ "揾": 21224,
326
+ "搆": 21173,
327
+ "搣": 22183,
328
+ "搦": 21388,
329
+ "搧": 22012,
330
+ "搲": 21992,
331
+ "搾": 21847,
332
+ "摑": 21918,
333
+ "摭": 21970,
334
+ "摱": 21568,
335
+ "摶": 21596,
336
+ "摷": 22196,
337
+ "撘": 22250,
338
+ "撣": 21748,
339
+ "撳": 21142,
340
+ "撾": 21688,
341
+ "擧": 21538,
342
+ "擸": 22370,
343
+ "攆": 22408,
344
+ "攋": 21179,
345
+ "攰": 21390,
346
+ "攴": 22330,
347
+ "攷": 21129,
348
+ "斉": 21543,
349
+ "斕": 22347,
350
+ "旄": 22279,
351
+ "旉": 21788,
352
+ "旒": 22339,
353
+ "旚": 21770,
354
+ "旛": 22186,
355
+ "旣": 21685,
356
+ "旯": 21199,
357
+ "旼": 21886,
358
+ "昃": 21762,
359
+ "昑": 22069,
360
+ "昚": 21450,
361
+ "昞": 21299,
362
+ "昪": 21583,
363
+ "昫": 21975,
364
+ "昰": 21276,
365
+ "昺": 21275,
366
+ "晄": 22188,
367
+ "晈": 22355,
368
+ "晧": 22464,
369
+ "暎": 21297,
370
+ "暘": 21177,
371
+ "暠": 21401,
372
+ "暦": 22362,
373
+ "曌": 22022,
374
+ "曩": 22281,
375
+ "曱": 21345,
376
+ "曷": 21611,
377
+ "曺": 21771,
378
+ "曽": 22322,
379
+ "朊": 22292,
380
+ "朏": 21231,
381
+ "朶": 22472,
382
+ "杘": 21867,
383
+ "杧": 22385,
384
+ "杲": 21781,
385
+ "枓": 21498,
386
+ "枞": 22266,
387
+ "柊": 21938,
388
+ "柙": 22192,
389
+ "柝": 21681,
390
+ "柰": 21438,
391
+ "柷": 21733,
392
+ "栢": 21269,
393
+ "栱": 22039,
394
+ "栲": 21149,
395
+ "栴": 22502,
396
+ "桄": 22269,
397
+ "桕": 21665,
398
+ "桫": 21905,
399
+ "桴": 21976,
400
+ "桷": 22037,
401
+ "梔": 21451,
402
+ "梘": 21387,
403
+ "梠": 21855,
404
+ "梣": 22451,
405
+ "棐": 21500,
406
+ "棓": 22161,
407
+ "棨": 22180,
408
+ "棻": 21210,
409
+ "椏": 22216,
410
+ "椤": 22060,
411
+ "椥": 22320,
412
+ "椴": 22373,
413
+ "楙": 22008,
414
+ "楢": 21766,
415
+ "楯": 21180,
416
+ "榊": 21957,
417
+ "榎": 21513,
418
+ "榘": 21678,
419
+ "榣": 22436,
420
+ "榧": 22247,
421
+ "槊": 21303,
422
+ "槙": 21654,
423
+ "槨": 22384,
424
+ "樅": 22040,
425
+ "樋": 21512,
426
+ "樖": 21219,
427
+ "樗": 21239,
428
+ "樘": 21580,
429
+ "樛": 22525,
430
+ "樨": 21836,
431
+ "樴": 21796,
432
+ "橈": 22409,
433
+ "橚": 21497,
434
+ "橛": 21226,
435
+ "檠": 22086,
436
+ "檨": 22420,
437
+ "櫈": 21468,
438
+ "櫓": 21579,
439
+ "櫟": 21558,
440
+ "櫳": 21502,
441
+ "欅": 22513,
442
+ "欏": 21906,
443
+ "欤": 21755,
444
+ "歟": 21963,
445
+ "歿": 21601,
446
+ "殂": 21398,
447
+ "殄": 21548,
448
+ "殛": 22119,
449
+ "殮": 21689,
450
+ "殻": 21690,
451
+ "毌": 21822,
452
+ "毐": 21397,
453
+ "毖": 21589,
454
+ "毬": 21571,
455
+ "氂": 21262,
456
+ "氕": 21475,
457
+ "氘": 21476,
458
+ "氚": 21479,
459
+ "氩": 21470,
460
+ "氬": 22146,
461
+ "氼": 22392,
462
+ "汊": 21768,
463
+ "汚": 21649,
464
+ "汜": 21373,
465
+ "汭": 21944,
466
+ "沆": 21705,
467
+ "沊": 21627,
468
+ "沔": 21663,
469
+ "沚": 21909,
470
+ "沩": 22199,
471
+ "泂": 21916,
472
+ "泅": 21253,
473
+ "泚": 21334,
474
+ "洎": 21756,
475
+ "洢": 22084,
476
+ "洧": 21591,
477
+ "洭": 22435,
478
+ "洺": 21787,
479
+ "浈": 21837,
480
+ "浍": 22049,
481
+ "浛": 21812,
482
+ "浞": 21791,
483
+ "浠": 21549,
484
+ "涑": 22550,
485
+ "淏": 21233,
486
+ "淛": 21382,
487
+ "淝": 21272,
488
+ "淥": 22111,
489
+ "淯": 21947,
490
+ "淸": 21227,
491
+ "渌": 22470,
492
+ "渕": 21826,
493
+ "渟": 22227,
494
+ "湉": 21432,
495
+ "湎": 22391,
496
+ "湓": 22226,
497
+ "湜": 21815,
498
+ "湞": 21811,
499
+ "湣": 22042,
500
+ "湴": 21739,
501
+ "湼": 22366,
502
+ "溆": 21244,
503
+ "溦": 22249,
504
+ "滉": 21456,
505
+ "滎": 21445,
506
+ "滏": 22537,
507
+ "滘": 21266,
508
+ "滹": 22301,
509
+ "漖": 22184,
510
+ "漚": 22453,
511
+ "漼": 21945,
512
+ "潁": 21616,
513
+ "潯": 21828,
514
+ "潽": 21296,
515
+ "澁": 22179,
516
+ "澂": 21175,
517
+ "澇": 21282,
518
+ "澌": 21742,
519
+ "澔": 21195,
520
+ "澠": 21959,
521
+ "澪": 21535,
522
+ "澯": 22159,
523
+ "澶": 21684,
524
+ "濉": 22287,
525
+ "濊": 22504,
526
+ "濞": 21919,
527
+ "濰": 21974,
528
+ "濶": 21880,
529
+ "瀍": 22369,
530
+ "瀦": 22501,
531
+ "灃": 21434,
532
+ "灕": 21940,
533
+ "灤": 21176,
534
+ "炆": 21630,
535
+ "炘": 21232,
536
+ "炟": 22100,
537
+ "炤": 22139,
538
+ "烚": 21717,
539
+ "烜": 21995,
540
+ "烝": 22101,
541
+ "烱": 21978,
542
+ "烴": 21477,
543
+ "烺": 21174,
544
+ "焌": 21268,
545
+ "焓": 21931,
546
+ "煇": 21404,
547
+ "煚": 22052,
548
+ "煠": 22556,
549
+ "煬": 21414,
550
+ "熜": 21814,
551
+ "熲": 21929,
552
+ "熺": 21790,
553
+ "燏": 22202,
554
+ "燐": 22197,
555
+ "燬": 21516,
556
+ "燶": 21881,
557
+ "燾": 21300,
558
+ "爨": 21911,
559
+ "爿": 22299,
560
+ "牀": 21290,
561
+ "牁": 22462,
562
+ "牂": 22278,
563
+ "牖": 22231,
564
+ "牘": 21711,
565
+ "犂": 22343,
566
+ "犛": 22030,
567
+ "犰": 21715,
568
+ "狁": 22515,
569
+ "狍": 21954,
570
+ "狓": 22295,
571
+ "狛": 22566,
572
+ "狳": 21716,
573
+ "狷": 22157,
574
+ "猁": 22201,
575
+ "猇": 21960,
576
+ "猊": 22386,
577
+ "猞": 22200,
578
+ "猢": 22144,
579
+ "猻": 22145,
580
+ "獁": 22296,
581
+ "獏": 22567,
582
+ "獬": 22044,
583
+ "獴": 21559,
584
+ "玆": 21640,
585
+ "玕": 21604,
586
+ "玗": 22095,
587
+ "玘": 22225,
588
+ "玢": 22527,
589
+ "玹": 21858,
590
+ "珓": 21691,
591
+ "珧": 22017,
592
+ "珽": 21642,
593
+ "琚": 21460,
594
+ "琤": 21656,
595
+ "琹": 21671,
596
+ "琿": 21431,
597
+ "瑂": 21376,
598
+ "瑈": 22067,
599
+ "瑭": 21582,
600
+ "瑷": 21428,
601
+ "璆": 22260,
602
+ "璈": 22325,
603
+ "璘": 21384,
604
+ "璠": 21670,
605
+ "璣": 21550,
606
+ "璥": 22579,
607
+ "璦": 21277,
608
+ "璩": 21792,
609
+ "璫": 21393,
610
+ "瓌": 21457,
611
+ "瓔": 22053,
612
+ "瓘": 21644,
613
+ "瓚": 21614,
614
+ "瓠": 21263,
615
+ "甂": 22160,
616
+ "甑": 21994,
617
+ "甪": 21920,
618
+ "甴": 21346,
619
+ "畈": 21669,
620
+ "畋": 21949,
621
+ "畚": 22172,
622
+ "畠": 21602,
623
+ "畧": 21436,
624
+ "畬": 22495,
625
+ "畯": 22258,
626
+ "畵": 22300,
627
+ "疃": 22445,
628
+ "疋": 21888,
629
+ "疍": 22010,
630
+ "疎": 21700,
631
+ "疴": 21441,
632
+ "痲": 21415,
633
+ "痾": 21763,
634
+ "瘰": 21488,
635
+ "癆": 21222,
636
+ "癩": 21489,
637
+ "癰": 21386,
638
+ "皝": 21156,
639
+ "皞": 21363,
640
+ "眛": 22065,
641
+ "眜": 21698,
642
+ "眭": 21902,
643
+ "睍": 21751,
644
+ "睚": 21168,
645
+ "睺": 21552,
646
+ "睼": 21617,
647
+ "瞽": 22474,
648
+ "砀": 21695,
649
+ "砈": 21802,
650
+ "砟": 22169,
651
+ "砦": 22390,
652
+ "砫": 22374,
653
+ "砬": 21973,
654
+ "砵": 21235,
655
+ "砹": 22235,
656
+ "砻": 21623,
657
+ "硃": 22001,
658
+ "硇": 21834,
659
+ "硏": 21380,
660
+ "硖": 22270,
661
+ "硚": 21735,
662
+ "硤": 21465,
663
+ "碭": 21270,
664
+ "碲": 21189,
665
+ "磔": 22224,
666
+ "磧": 21797,
667
+ "磴": 21406,
668
+ "磾": 21258,
669
+ "礒": 21198,
670
+ "礬": 21710,
671
+ "礮": 21725,
672
+ "礽": 21287,
673
+ "祆": 21205,
674
+ "祏": 22351,
675
+ "祓": 22234,
676
+ "祔": 21328,
677
+ "祘": 22071,
678
+ "祧": 22176,
679
+ "祼": 21607,
680
+ "禑": 21746,
681
+ "禔": 22072,
682
+ "禕": 21425,
683
+ "禟": 21285,
684
+ "禡": 22230,
685
+ "禤": 21646,
686
+ "禥": 21852,
687
+ "禩": 21284,
688
+ "禰": 21702,
689
+ "禳": 22222,
690
+ "禵": 21286,
691
+ "秈": 22194,
692
+ "稃": 22193,
693
+ "稈": 21587,
694
+ "稙": 21981,
695
+ "穏": 21501,
696
+ "穡": 22120,
697
+ "穰": 21694,
698
+ "窣": 21321,
699
+ "窰": 21358,
700
+ "竈": 21882,
701
+ "竉": 21565,
702
+ "竑": 21672,
703
+ "竦": 22280,
704
+ "竪": 21442,
705
+ "笄": 22175,
706
+ "笘": 22538,
707
+ "笪": 21143,
708
+ "笭": 22415,
709
+ "笮": 21997,
710
+ "笳": 21613,
711
+ "筅": 22577,
712
+ "筧": 22236,
713
+ "筮": 21825,
714
+ "筴": 21708,
715
+ "箓": 22405,
716
+ "箧": 22271,
717
+ "箬": 21135,
718
+ "篋": 22122,
719
+ "篦": 21577,
720
+ "篾": 22016,
721
+ "簋": 22358,
722
+ "簒": 21539,
723
+ "簕": 22015,
724
+ "簣": 22061,
725
+ "籀": 22290,
726
+ "籙": 22205,
727
+ "籣": 21575,
728
+ "籾": 22519,
729
+ "粢": 21745,
730
+ "糌": 22575,
731
+ "糭": 21352,
732
+ "糴": 22116,
733
+ "糶": 22526,
734
+ "紇": 21658,
735
+ "紈": 22282,
736
+ "紥": 21723,
737
+ "紬": 22506,
738
+ "絀": 21626,
739
+ "絛": 22511,
740
+ "絜": 22059,
741
+ "綉": 21206,
742
+ "綝": 21823,
743
+ "綟": 22087,
744
+ "綣": 21750,
745
+ "綰": 21344,
746
+ "綷": 22141,
747
+ "緡": 21491,
748
+ "緱": 22167,
749
+ "緲": 22543,
750
+ "縉": 21326,
751
+ "縊": 21166,
752
+ "縐": 21677,
753
+ "縞": 21935,
754
+ "縠": 22400,
755
+ "縻": 21160,
756
+ "繑": 22558,
757
+ "繒": 22500,
758
+ "繖": 22368,
759
+ "繙": 21350,
760
+ "繯": 21360,
761
+ "繻": 22257,
762
+ "繾": 21749,
763
+ "纈": 22388,
764
+ "纘": 21636,
765
+ "纛": 21712,
766
+ "纥": 21892,
767
+ "绹": 22551,
768
+ "缬": 22583,
769
+ "缵": 21378,
770
+ "缶": 22238,
771
+ "罃": 22331,
772
+ "罅": 21622,
773
+ "罈": 21817,
774
+ "罉": 21657,
775
+ "罘": 22317,
776
+ "罟": 21737,
777
+ "罾": 22425,
778
+ "羋": 21895,
779
+ "羕": 21953,
780
+ "羗": 22253,
781
+ "羰": 21251,
782
+ "翀": 22426,
783
+ "翕": 21687,
784
+ "翥": 22353,
785
+ "翬": 22272,
786
+ "耖": 21590,
787
+ "耜": 22032,
788
+ "聃": 22154,
789
+ "聡": 21816,
790
+ "聵": 21988,
791
+ "肟": 22291,
792
+ "肼": 21912,
793
+ "胂": 22097,
794
+ "胐": 21985,
795
+ "胪": 22024,
796
+ "胼": 22323,
797
+ "脒": 22522,
798
+ "脧": 22211,
799
+ "脷": 21433,
800
+ "腍": 21335,
801
+ "腧": 22467,
802
+ "膣": 22544,
803
+ "膥": 21544,
804
+ "膶": 22477,
805
+ "臏": 21907,
806
+ "臚": 22242,
807
+ "舘": 21494,
808
+ "舢": 21848,
809
+ "舨": 21849,
810
+ "艉": 22540,
811
+ "艏": 21588,
812
+ "艶": 21650,
813
+ "艸": 22246,
814
+ "芗": 22204,
815
+ "苅": 22158,
816
+ "苌": 22265,
817
+ "苎": 21668,
818
+ "苧": 21699,
819
+ "苳": 21676,
820
+ "苴": 21238,
821
+ "苾": 22442,
822
+ "茆": 22208,
823
+ "茌": 22450,
824
+ "茘": 21213,
825
+ "茛": 21437,
826
+ "荑": 21572,
827
+ "莜": 22036,
828
+ "莨": 21774,
829
+ "莼": 21574,
830
+ "菉": 22510,
831
+ "菫": 22106,
832
+ "菰": 22478,
833
+ "菴": 21948,
834
+ "菻": 22469,
835
+ "萇": 21170,
836
+ "萜": 21785,
837
+ "萣": 22438,
838
+ "葰": 22178,
839
+ "葶": 21391,
840
+ "蒗": 22284,
841
+ "蒯": 21323,
842
+ "蒴": 21385,
843
+ "蒽": 22447,
844
+ "蓀": 21576,
845
+ "蓍": 22379,
846
+ "蓥": 21485,
847
+ "蔞": 21728,
848
+ "蔦": 21827,
849
+ "蔴": 21209,
850
+ "蕓": 21197,
851
+ "蕰": 21832,
852
+ "蕷": 21727,
853
+ "蕹": 22530,
854
+ "薜": 22289,
855
+ "薤": 21196,
856
+ "薫": 22397,
857
+ "薮": 22149,
858
+ "薺": 21545,
859
+ "藁": 21736,
860
+ "藪": 21162,
861
+ "藶": 21392,
862
+ "藷": 21637,
863
+ "藺": 21394,
864
+ "蘄": 22401,
865
+ "蘅": 21641,
866
+ "蘖": 21818,
867
+ "蘢": 22489,
868
+ "虯": 21551,
869
+ "虺": 21517,
870
+ "蚋": 22134,
871
+ "蚧": 22005,
872
+ "蚶": 22418,
873
+ "蚺": 22328,
874
+ "蛄": 21178,
875
+ "蛉": 22138,
876
+ "蛞": 21566,
877
+ "蛸": 22056,
878
+ "蛺": 22014,
879
+ "蜉": 21632,
880
+ "蜑": 21359,
881
+ "蜞": 22421,
882
+ "蜩": 21896,
883
+ "蝓": 21567,
884
+ "蝣": 21633,
885
+ "蝮": 21418,
886
+ "蝰": 22127,
887
+ "蝽": 21419,
888
+ "蝾": 22273,
889
+ "螄": 22571,
890
+ "螅": 21449,
891
+ "螈": 21487,
892
+ "螟": 22389,
893
+ "螣": 22536,
894
+ "螫": 22090,
895
+ "螭": 22319,
896
+ "螻": 21595,
897
+ "螽": 22494,
898
+ "蟌": 22574,
899
+ "蟜": 21364,
900
+ "蟧": 22316,
901
+ "蟳": 22542,
902
+ "蠄": 22315,
903
+ "蠊": 22549,
904
+ "蠏": 21598,
905
+ "蠑": 21486,
906
+ "蠖": 22115,
907
+ "蠲": 22102,
908
+ "蠵": 21721,
909
+ "衊": 21980,
910
+ "衎": 21458,
911
+ "衕": 22143,
912
+ "衚": 22142,
913
+ "衽": 21308,
914
+ "袛": 21752,
915
+ "袢": 21899,
916
+ "袴": 22099,
917
+ "裇": 21322,
918
+ "褘": 22561,
919
+ "褦": 21289,
920
+ "褸": 21204,
921
+ "襖": 21371,
922
+ "襦": 22399,
923
+ "覈": 21854,
924
+ "覡": 22118,
925
+ "覲": 21483,
926
+ "觚": 21153,
927
+ "觜": 21951,
928
+ "觳": 22432,
929
+ "觴": 21256,
930
+ "訃": 21923,
931
+ "訇": 21952,
932
+ "訌": 21374,
933
+ "訐": 21784,
934
+ "訢": 21429,
935
+ "訾": 21368,
936
+ "詁": 21240,
937
+ "詈": 22263,
938
+ "詎": 22332,
939
+ "詏": 21301,
940
+ "詒": 21704,
941
+ "詝": 21427,
942
+ "詧": 21638,
943
+ "諂": 22156,
944
+ "諉": 21518,
945
+ "諍": 22088,
946
+ "諛": 21761,
947
+ "諤": 21389,
948
+ "諶": 21893,
949
+ "謇": 22045,
950
+ "謖": 21962,
951
+ "謚": 21265,
952
+ "謡": 21857,
953
+ "謫": 21507,
954
+ "謳": 21259,
955
+ "譞": 21430,
956
+ "譟": 21776,
957
+ "讃": 22520,
958
+ "讉": 22107,
959
+ "讒": 22103,
960
+ "讖": 21618,
961
+ "讣": 22248,
962
+ "诂": 21652,
963
+ "诒": 21462,
964
+ "谡": 22547,
965
+ "谿": 21422,
966
+ "豕": 21236,
967
+ "豨": 22341,
968
+ "豳": 22173,
969
+ "豸": 22313,
970
+ "貉": 21482,
971
+ "貊": 21958,
972
+ "貍": 21310,
973
+ "賒": 21904,
974
+ "賛": 22239,
975
+ "賾": 22375,
976
+ "贄": 21496,
977
+ "贇": 22075,
978
+ "贔": 21247,
979
+ "贗": 22554,
980
+ "贽": 21508,
981
+ "赉": 21145,
982
+ "赟": 22076,
983
+ "跗": 22162,
984
+ "跣": 21341,
985
+ "踎": 21295,
986
+ "踭": 21147,
987
+ "踰": 21783,
988
+ "蹕": 21309,
989
+ "蹠": 21917,
990
+ "躂": 21293,
991
+ "躄": 21361,
992
+ "躅": 22110,
993
+ "躑": 22580,
994
+ "躝": 21362,
995
+ "軔": 21977,
996
+ "軚": 21738,
997
+ "軛": 21930,
998
+ "軨": 22486,
999
+ "軫": 21193,
1000
+ "軭": 22417,
1001
+ "軻": 22215,
1002
+ "輅": 22531,
1003
+ "輋": 21446,
1004
+ "輜": 21996,
1005
+ "輞": 21354,
1006
+ "輦": 21307,
1007
+ "轂": 22108,
1008
+ "轫": 21594,
1009
+ "轸": 22529,
1010
+ "辎": 22165,
1011
+ "迨": 21968,
1012
+ "迳": 22481,
1013
+ "迾": 22514,
1014
+ "逄": 21983,
1015
+ "逋": 21305,
1016
+ "逑": 21719,
1017
+ "逖": 21838,
1018
+ "逯": 22112,
1019
+ "逳": 21229,
1020
+ "逹": 21873,
1021
+ "遯": 22412,
1022
+ "遹": 21683,
1023
+ "邗": 21495,
1024
+ "邙": 22395,
1025
+ "邠": 21831,
1026
+ "邡": 21484,
1027
+ "邴": 22245,
1028
+ "邶": 21243,
1029
+ "邽": 22491,
1030
+ "邾": 21605,
1031
+ "郃": 21903,
1032
+ "郇": 21302,
1033
+ "郏": 21410,
1034
+ "郓": 21753,
1035
+ "郕": 22155,
1036
+ "郗": 21998,
1037
+ "郛": 21453,
1038
+ "郞": 21131,
1039
+ "郯": 21824,
1040
+ "郾": 21789,
1041
+ "郿": 21961,
1042
+ "鄄": 22337,
1043
+ "鄆": 21843,
1044
+ "鄕": 21171,
1045
+ "鄖": 22025,
1046
+ "鄘": 21767,
1047
+ "鄚": 22364,
1048
+ "鄜": 22480,
1049
+ "鄣": 21620,
1050
+ "鄩": 21370,
1051
+ "鄫": 22479,
1052
+ "鄮": 22434,
1053
+ "鄯": 22326,
1054
+ "鄴": 21154,
1055
+ "酃": 22484,
1056
+ "酆": 21208,
1057
+ "酈": 22361,
1058
+ "酎": 22524,
1059
+ "醂": 22019,
1060
+ "醌": 22448,
1061
+ "醢": 22461,
1062
+ "釆": 22286,
1063
+ "釓": 22089,
1064
+ "釔": 22080,
1065
+ "釕": 21862,
1066
+ "釙": 21801,
1067
+ "釤": 22304,
1068
+ "釩": 21541,
1069
+ "釷": 21511,
1070
+ "釹": 21760,
1071
+ "鈁": 22548,
1072
+ "鈇": 21188,
1073
+ "鈈": 21490,
1074
+ "鈐": 21132,
1075
+ "鈧": 22079,
1076
+ "鈮": 21158,
1077
+ "鈰": 21481,
1078
+ "鈳": 22324,
1079
+ "鈷": 21317,
1080
+ "鈸": 21267,
1081
+ "鈹": 21478,
1082
+ "鈽": 21924,
1083
+ "鈿": 21413,
1084
+ "鉆": 21474,
1085
+ "鉈": 21875,
1086
+ "鉍": 21316,
1087
+ "鉕": 21900,
1088
+ "鉝": 21526,
1089
+ "鉞": 21150,
1090
+ "鉢": 21631,
1091
+ "鉦": 22318,
1092
+ "鉨": 22586,
1093
+ "鉬": 21159,
1094
+ "鉭": 21629,
1095
+ "鉲": 22414,
1096
+ "鉸": 21443,
1097
+ "鉺": 22218,
1098
+ "鉼": 22440,
1099
+ "鉾": 21709,
1100
+ "鉿": 21946,
1101
+ "銚": 22187,
1102
+ "銛": 22140,
1103
+ "銠": 21318,
1104
+ "銣": 21185,
1105
+ "銥": 21274,
1106
+ "銦": 21186,
1107
+ "銨": 22026,
1108
+ "銩": 22449,
1109
+ "銪": 22311,
1110
+ "銫": 21183,
1111
+ "銲": 22476,
1112
+ "銶": 21523,
1113
+ "銻": 22098,
1114
+ "銼": 21230,
1115
+ "鋂": 22460,
1116
+ "鋆": 21139,
1117
+ "鋇": 21466,
1118
+ "鋌": 22348,
1119
+ "鋐": 21897,
1120
+ "鋦": 21680,
1121
+ "鋨": 21859,
1122
+ "鋭": 22074,
1123
+ "鋯": 21480,
1124
+ "鋱": 22302,
1125
+ "鋹": 21846,
1126
+ "錀": 22570,
1127
+ "錇": 22475,
1128
+ "錒": 21315,
1129
+ "錕": 21454,
1130
+ "錡": 21527,
1131
+ "錬": 22251,
1132
+ "錸": 22459,
1133
+ "錼": 22457,
1134
+ "鍀": 22568,
1135
+ "鍅": 21679,
1136
+ "鍆": 22499,
1137
+ "鍇": 21673,
1138
+ "鍔": 21337,
1139
+ "鍘": 22585,
1140
+ "鍚": 22073,
1141
+ "鍩": 22466,
1142
+ "鍬": 21563,
1143
+ "鍶": 22308,
1144
+ "鍼": 21840,
1145
+ "鎅": 21515,
1146
+ "鎘": 22264,
1147
+ "鎝": 22456,
1148
+ "鎢": 21813,
1149
+ "鎦": 22458,
1150
+ "鎩": 21891,
1151
+ "鎭": 21298,
1152
+ "鎰": 21747,
1153
+ "鎵": 21184,
1154
+ "鎶": 21187,
1155
+ "鎿": 22588,
1156
+ "鏃": 22109,
1157
+ "鏇": 21546,
1158
+ "鏊": 22503,
1159
+ "鏌": 21555,
1160
+ "鏐": 21331,
1161
+ "鏑": 22013,
1162
+ "鏵": 21597,
1163
+ "鏷": 22455,
1164
+ "鏸": 21340,
1165
+ "鏹": 22406,
1166
+ "鐐": 22009,
1167
+ "鐒": 22483,
1168
+ "鐙": 21830,
1169
+ "鐠": 22303,
1170
+ "鐨": 21860,
1171
+ "鐫": 22217,
1172
+ "鐽": 21861,
1173
+ "鐿": 21874,
1174
+ "鑀": 22535,
1175
+ "鑌": 21782,
1176
+ "鑛": 22528,
1177
+ "鑠": 21304,
1178
+ "鑥": 22573,
1179
+ "鑪": 21624,
1180
+ "鑭": 21662,
1181
+ "鑴": 22268,
1182
+ "鑷": 22381,
1183
+ "钆": 22560,
1184
+ "钇": 21472,
1185
+ "钋": 21876,
1186
+ "钌": 21493,
1187
+ "钍": 21879,
1188
+ "钐": 22309,
1189
+ "钕": 21471,
1190
+ "钚": 22427,
1191
+ "钪": 22306,
1192
+ "钫": 22047,
1193
+ "钬": 22334,
1194
+ "钷": 22307,
1195
+ "钽": 21666,
1196
+ "铈": 21164,
1197
+ "铊": 21877,
1198
+ "铋": 21570,
1199
+ "铌": 22083,
1200
+ "铍": 21667,
1201
+ "铑": 22422,
1202
+ "铒": 22333,
1203
+ "铟": 21192,
1204
+ "铥": 22565,
1205
+ "铪": 22428,
1206
+ "铯": 21163,
1207
+ "铱": 22546,
1208
+ "铳": 21726,
1209
+ "铷": 22454,
1210
+ "铼": 22430,
1211
+ "铽": 22310,
1212
+ "锇": 22523,
1213
+ "锕": 22452,
1214
+ "锗": 21800,
1215
+ "锴": 22152,
1216
+ "锶": 22534,
1217
+ "锷": 22048,
1218
+ "镆": 22365,
1219
+ "镎": 22404,
1220
+ "镒": 22437,
1221
+ "镓": 21191,
1222
+ "镝": 21965,
1223
+ "镠": 22041,
1224
+ "镤": 22429,
1225
+ "镥": 22564,
1226
+ "镧": 21878,
1227
+ "镨": 22305,
1228
+ "閂": 21221,
1229
+ "閆": 22051,
1230
+ "閤": 22189,
1231
+ "閦": 21653,
1232
+ "閪": 21868,
1233
+ "閬": 21619,
1234
+ "閭": 21212,
1235
+ "閰": 22223,
1236
+ "閼": 22114,
1237
+ "闈": 21324,
1238
+ "闋": 22057,
1239
+ "闐": 21585,
1240
+ "闓": 21463,
1241
+ "闞": 22262,
1242
+ "闥": 22003,
1243
+ "闳": 21260,
1244
+ "闼": 21754,
1245
+ "闿": 21664,
1246
+ "阋": 22493,
1247
+ "阗": 22002,
1248
+ "阯": 22171,
1249
+ "陉": 22168,
1250
+ "陔": 21674,
1251
+ "陘": 22064,
1252
+ "陜": 22166,
1253
+ "陬": 22153,
1254
+ "隗": 21365,
1255
+ "隰": 22121,
1256
+ "隷": 21400,
1257
+ "隹": 21693,
1258
+ "雩": 22363,
1259
+ "雫": 22233,
1260
+ "霅": 22398,
1261
+ "靑": 22314,
1262
+ "靺": 21280,
1263
+ "鞨": 21281,
1264
+ "鞮": 21743,
1265
+ "韁": 22207,
1266
+ "韃": 21294,
1267
+ "韙": 22555,
1268
+ "韞": 21593,
1269
+ "韡": 21744,
1270
+ "韮": 21706,
1271
+ "頊": 21367,
1272
+ "頎": 22505,
1273
+ "頦": 21922,
1274
+ "頴": 21645,
1275
+ "頵": 22288,
1276
+ "顒": 21794,
1277
+ "顓": 21366,
1278
+ "顕": 22552,
1279
+ "顗": 21136,
1280
+ "顥": 21682,
1281
+ "顳": 21606,
1282
+ "颙": 22243,
1283
+ "颺": 21635,
1284
+ "飈": 21979,
1285
+ "飭": 22050,
1286
+ "飴": 21250,
1287
+ "餑": 22113,
1288
+ "餬": 22182,
1289
+ "餸": 21223,
1290
+ "餽": 22148,
1291
+ "餿": 22492,
1292
+ "饉": 21866,
1293
+ "馱": 21313,
1294
+ "駖": 22382,
1295
+ "駙": 21758,
1296
+ "駟": 21773,
1297
+ "駡": 22335,
1298
+ "駢": 21839,
1299
+ "駰": 21423,
1300
+ "騒": 21578,
1301
+ "騤": 21915,
1302
+ "騫": 21257,
1303
+ "騭": 21829,
1304
+ "騮": 21200,
1305
+ "騾": 21167,
1306
+ "驁": 21351,
1307
+ "驃": 21473,
1308
+ "驄": 21383,
1309
+ "驤": 21612,
1310
+ "驩": 22085,
1311
+ "驪": 21140,
1312
+ "骘": 22423,
1313
+ "骹": 21237,
1314
+ "髀": 21203,
1315
+ "髁": 22254,
1316
+ "髑": 22031,
1317
+ "髙": 22221,
1318
+ "髡": 22349,
1319
+ "髢": 22507,
1320
+ "髭": 21254,
1321
+ "髹": 22062,
1322
+ "鬅": 21564,
1323
+ "鬘": 22066,
1324
+ "鬨": 21764,
1325
+ "鬩": 22117,
1326
+ "鬪": 21338,
1327
+ "鬬": 21697,
1328
+ "鬭": 21956,
1329
+ "鬯": 21643,
1330
+ "鬲": 21369,
1331
+ "鬻": 21261,
1332
+ "魎": 22463,
1333
+ "魟": 21730,
1334
+ "魨": 21181,
1335
+ "魮": 22128,
1336
+ "魴": 22498,
1337
+ "鮈": 21990,
1338
+ "鮋": 21890,
1339
+ "鮎": 21146,
1340
+ "鮒": 22512,
1341
+ "鮟": 21342,
1342
+ "鮫": 21246,
1343
+ "鯀": 22007,
1344
+ "鯇": 21870,
1345
+ "鯓": 22163,
1346
+ "鯡": 21707,
1347
+ "鯢": 21982,
1348
+ "鯤": 21889,
1349
+ "鯥": 22387,
1350
+ "鯪": 21554,
1351
+ "鯭": 22123,
1352
+ "鯷": 21718,
1353
+ "鯿": 22487,
1354
+ "鰂": 21207,
1355
+ "鰈": 22275,
1356
+ "鰐": 22344,
1357
+ "鰕": 22082,
1358
+ "鰨": 22345,
1359
+ "鰩": 22130,
1360
+ "鰱": 22312,
1361
+ "鰹": 21409,
1362
+ "鰺": 21729,
1363
+ "鱀": 21820,
1364
+ "鱇": 21343,
1365
+ "鱊": 22129,
1366
+ "鱒": 21319,
1367
+ "鱘": 21417,
1368
+ "鱟": 22020,
1369
+ "鱥": 22590,
1370
+ "鱧": 22557,
1371
+ "鱭": 21986,
1372
+ "鱲": 21151,
1373
+ "鲀": 22081,
1374
+ "鲂": 21424,
1375
+ "鲎": 22587,
1376
+ "鲧": 22581,
1377
+ "鲮": 21933,
1378
+ "鲽": 22336,
1379
+ "鳊": 22220,
1380
+ "鳎": 22545,
1381
+ "鳔": 21819,
1382
+ "鳟": 22327,
1383
+ "鳧": 22283,
1384
+ "鳯": 21201,
1385
+ "鳽": 22029,
1386
+ "鴆": 22259,
1387
+ "鴇": 21634,
1388
+ "鴒": 21778,
1389
+ "鴝": 21357,
1390
+ "鴞": 21182,
1391
+ "鴟": 21610,
1392
+ "鴣": 21561,
1393
+ "鴯": 21803,
1394
+ "鴴": 21660,
1395
+ "鴷": 22136,
1396
+ "鵐": 22497,
1397
+ "鵞": 21845,
1398
+ "鵟": 22132,
1399
+ "鵠": 21850,
1400
+ "鵪": 21556,
1401
+ "鵯": 22584,
1402
+ "鶇": 21562,
1403
+ "鶉": 21557,
1404
+ "鶒": 22562,
1405
+ "鶓": 21804,
1406
+ "鶚": 22354,
1407
+ "鶡": 21194,
1408
+ "鶲": 21356,
1409
+ "鶺": 21777,
1410
+ "鶻": 21332,
1411
+ "鶿": 21348,
1412
+ "鷀": 21553,
1413
+ "鷂": 21220,
1414
+ "鷄": 21336,
1415
+ "鷈": 21809,
1416
+ "鷉": 21807,
1417
+ "鷓": 21560,
1418
+ "鷯": 21810,
1419
+ "鷸": 21805,
1420
+ "鷿": 21808,
1421
+ "鸂": 22563,
1422
+ "鸊": 21806,
1423
+ "鸌": 22137,
1424
+ "鸕": 21347,
1425
+ "鸛": 21569,
1426
+ "鸝": 21925,
1427
+ "鸩": 22338,
1428
+ "鸫": 21608,
1429
+ "鸮": 21799,
1430
+ "鸰": 21780,
1431
+ "鹀": 22133,
1432
+ "鹗": 21934,
1433
+ "鹡": 21779,
1434
+ "鹩": 22135,
1435
+ "鹬": 22516,
1436
+ "鹱": 21520,
1437
+ "麖": 22018,
1438
+ "麪": 21252,
1439
+ "麯": 22532,
1440
+ "麹": 21531,
1441
+ "麿": 21492,
1442
+ "黐": 21586,
1443
+ "黟": 21273,
1444
+ "黥": 21696,
1445
+ "黧": 21353,
1446
+ "黷": 22077,
1447
+ "黻": 21138,
1448
+ "黼": 21966,
1449
+ "黽": 22043,
1450
+ "鼆": 22393,
1451
+ "鼇": 21908,
1452
+ "鼍": 22541,
1453
+ "鼩": 22091,
1454
+ "鼯": 22093,
1455
+ "鼱": 22125,
1456
+ "鼴": 22092,
1457
+ "鼷": 22038,
1458
+ "齧": 22126,
1459
+ "齮": 22195,
1460
+ "齶": 22027,
1461
+ "龑": 21584,
1462
+ "龠": 21395,
1463
+ "龢": 21241,
1464
+ "鿕": 22297,
1465
+ "鿫": 21856,
1466
+ "鿬": 22569
1467
+ }
config.json ADDED
@@ -0,0 +1,31 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "./sts-cantonese",
3
+ "architectures": [
4
+ "BertModel"
5
+ ],
6
+ "attention_probs_dropout_prob": 0.1,
7
+ "classifier_dropout": null,
8
+ "directionality": "bidi",
9
+ "hidden_act": "gelu",
10
+ "hidden_dropout_prob": 0.1,
11
+ "hidden_size": 768,
12
+ "initializer_range": 0.02,
13
+ "intermediate_size": 3072,
14
+ "layer_norm_eps": 1e-12,
15
+ "max_position_embeddings": 512,
16
+ "model_type": "bert",
17
+ "num_attention_heads": 12,
18
+ "num_hidden_layers": 12,
19
+ "pad_token_id": 0,
20
+ "pooler_fc_size": 768,
21
+ "pooler_num_attention_heads": 12,
22
+ "pooler_num_fc_layers": 3,
23
+ "pooler_size_per_head": 128,
24
+ "pooler_type": "first_token_transform",
25
+ "position_embedding_type": "absolute",
26
+ "torch_dtype": "float32",
27
+ "transformers_version": "4.33.1",
28
+ "type_vocab_size": 2,
29
+ "use_cache": true,
30
+ "vocab_size": 22593
31
+ }
config_sentence_transformers.json ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ {
2
+ "__version__": {
3
+ "sentence_transformers": "2.2.2",
4
+ "transformers": "4.33.1",
5
+ "pytorch": "2.0.1+cu117"
6
+ }
7
+ }
modules.json ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "0",
5
+ "path": "",
6
+ "type": "sentence_transformers.models.Transformer"
7
+ },
8
+ {
9
+ "idx": 1,
10
+ "name": "1",
11
+ "path": "1_Pooling",
12
+ "type": "sentence_transformers.models.Pooling"
13
+ }
14
+ ]
pytorch_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1ce052c23e7ce6dbb3d17cb88a4d1ad7b0b5686763b8ec209dcb4c1a37f2b544
3
+ size 413635049
sentence_bert_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "max_seq_length": 512,
3
+ "do_lower_case": true
4
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ {
2
+ "cls_token": "[CLS]",
3
+ "mask_token": "[MASK]",
4
+ "pad_token": "[PAD]",
5
+ "sep_token": "[SEP]",
6
+ "unk_token": "[UNK]"
7
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "clean_up_tokenization_spaces": true,
3
+ "cls_token": "[CLS]",
4
+ "do_lower_case": false,
5
+ "mask_token": "[MASK]",
6
+ "max_length": 437,
7
+ "model_max_length": 512,
8
+ "pad_to_multiple_of": null,
9
+ "pad_token": "[PAD]",
10
+ "pad_token_type_id": 0,
11
+ "padding_side": "right",
12
+ "sep_token": "[SEP]",
13
+ "stride": 0,
14
+ "strip_accents": null,
15
+ "tokenize_chinese_chars": true,
16
+ "tokenizer_class": "BertTokenizer",
17
+ "truncation_side": "right",
18
+ "truncation_strategy": "longest_first",
19
+ "unk_token": "[UNK]"
20
+ }
vocab.txt ADDED
The diff for this file is too large to render. See raw diff