nishiwakikazutaka commited on
Commit
37ce69f
1 Parent(s): 8c3d141

Initial release.

Browse files
Files changed (8) hide show
  1. LICENSE +359 -0
  2. README.md +94 -1
  3. config.json +27 -0
  4. model.safetensors +3 -0
  5. pytorch_model.bin +3 -0
  6. special_tokens_map.json +7 -0
  7. tokenizer_config.json +14 -0
  8. vocab.txt +0 -0
LICENSE ADDED
@@ -0,0 +1,359 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Creative Commons Legal Code
2
+
3
+ Attribution-ShareAlike 3.0 Unported
4
+
5
+ CREATIVE COMMONS CORPORATION IS NOT A LAW FIRM AND DOES NOT PROVIDE
6
+ LEGAL SERVICES. DISTRIBUTION OF THIS LICENSE DOES NOT CREATE AN
7
+ ATTORNEY-CLIENT RELATIONSHIP. CREATIVE COMMONS PROVIDES THIS
8
+ INFORMATION ON AN "AS-IS" BASIS. CREATIVE COMMONS MAKES NO WARRANTIES
9
+ REGARDING THE INFORMATION PROVIDED, AND DISCLAIMS LIABILITY FOR
10
+ DAMAGES RESULTING FROM ITS USE.
11
+
12
+ License
13
+
14
+ THE WORK (AS DEFINED BELOW) IS PROVIDED UNDER THE TERMS OF THIS CREATIVE
15
+ COMMONS PUBLIC LICENSE ("CCPL" OR "LICENSE"). THE WORK IS PROTECTED BY
16
+ COPYRIGHT AND/OR OTHER APPLICABLE LAW. ANY USE OF THE WORK OTHER THAN AS
17
+ AUTHORIZED UNDER THIS LICENSE OR COPYRIGHT LAW IS PROHIBITED.
18
+
19
+ BY EXERCISING ANY RIGHTS TO THE WORK PROVIDED HERE, YOU ACCEPT AND AGREE
20
+ TO BE BOUND BY THE TERMS OF THIS LICENSE. TO THE EXTENT THIS LICENSE MAY
21
+ BE CONSIDERED TO BE A CONTRACT, THE LICENSOR GRANTS YOU THE RIGHTS
22
+ CONTAINED HERE IN CONSIDERATION OF YOUR ACCEPTANCE OF SUCH TERMS AND
23
+ CONDITIONS.
24
+
25
+ 1. Definitions
26
+
27
+ a. "Adaptation" means a work based upon the Work, or upon the Work and
28
+ other pre-existing works, such as a translation, adaptation,
29
+ derivative work, arrangement of music or other alterations of a
30
+ literary or artistic work, or phonogram or performance and includes
31
+ cinematographic adaptations or any other form in which the Work may be
32
+ recast, transformed, or adapted including in any form recognizably
33
+ derived from the original, except that a work that constitutes a
34
+ Collection will not be considered an Adaptation for the purpose of
35
+ this License. For the avoidance of doubt, where the Work is a musical
36
+ work, performance or phonogram, the synchronization of the Work in
37
+ timed-relation with a moving image ("synching") will be considered an
38
+ Adaptation for the purpose of this License.
39
+ b. "Collection" means a collection of literary or artistic works, such as
40
+ encyclopedias and anthologies, or performances, phonograms or
41
+ broadcasts, or other works or subject matter other than works listed
42
+ in Section 1(f) below, which, by reason of the selection and
43
+ arrangement of their contents, constitute intellectual creations, in
44
+ which the Work is included in its entirety in unmodified form along
45
+ with one or more other contributions, each constituting separate and
46
+ independent works in themselves, which together are assembled into a
47
+ collective whole. A work that constitutes a Collection will not be
48
+ considered an Adaptation (as defined below) for the purposes of this
49
+ License.
50
+ c. "Creative Commons Compatible License" means a license that is listed
51
+ at https://creativecommons.org/compatiblelicenses that has been
52
+ approved by Creative Commons as being essentially equivalent to this
53
+ License, including, at a minimum, because that license: (i) contains
54
+ terms that have the same purpose, meaning and effect as the License
55
+ Elements of this License; and, (ii) explicitly permits the relicensing
56
+ of adaptations of works made available under that license under this
57
+ License or a Creative Commons jurisdiction license with the same
58
+ License Elements as this License.
59
+ d. "Distribute" means to make available to the public the original and
60
+ copies of the Work or Adaptation, as appropriate, through sale or
61
+ other transfer of ownership.
62
+ e. "License Elements" means the following high-level license attributes
63
+ as selected by Licensor and indicated in the title of this License:
64
+ Attribution, ShareAlike.
65
+ f. "Licensor" means the individual, individuals, entity or entities that
66
+ offer(s) the Work under the terms of this License.
67
+ g. "Original Author" means, in the case of a literary or artistic work,
68
+ the individual, individuals, entity or entities who created the Work
69
+ or if no individual or entity can be identified, the publisher; and in
70
+ addition (i) in the case of a performance the actors, singers,
71
+ musicians, dancers, and other persons who act, sing, deliver, declaim,
72
+ play in, interpret or otherwise perform literary or artistic works or
73
+ expressions of folklore; (ii) in the case of a phonogram the producer
74
+ being the person or legal entity who first fixes the sounds of a
75
+ performance or other sounds; and, (iii) in the case of broadcasts, the
76
+ organization that transmits the broadcast.
77
+ h. "Work" means the literary and/or artistic work offered under the terms
78
+ of this License including without limitation any production in the
79
+ literary, scientific and artistic domain, whatever may be the mode or
80
+ form of its expression including digital form, such as a book,
81
+ pamphlet and other writing; a lecture, address, sermon or other work
82
+ of the same nature; a dramatic or dramatico-musical work; a
83
+ choreographic work or entertainment in dumb show; a musical
84
+ composition with or without words; a cinematographic work to which are
85
+ assimilated works expressed by a process analogous to cinematography;
86
+ a work of drawing, painting, architecture, sculpture, engraving or
87
+ lithography; a photographic work to which are assimilated works
88
+ expressed by a process analogous to photography; a work of applied
89
+ art; an illustration, map, plan, sketch or three-dimensional work
90
+ relative to geography, topography, architecture or science; a
91
+ performance; a broadcast; a phonogram; a compilation of data to the
92
+ extent it is protected as a copyrightable work; or a work performed by
93
+ a variety or circus performer to the extent it is not otherwise
94
+ considered a literary or artistic work.
95
+ i. "You" means an individual or entity exercising rights under this
96
+ License who has not previously violated the terms of this License with
97
+ respect to the Work, or who has received express permission from the
98
+ Licensor to exercise rights under this License despite a previous
99
+ violation.
100
+ j. "Publicly Perform" means to perform public recitations of the Work and
101
+ to communicate to the public those public recitations, by any means or
102
+ process, including by wire or wireless means or public digital
103
+ performances; to make available to the public Works in such a way that
104
+ members of the public may access these Works from a place and at a
105
+ place individually chosen by them; to perform the Work to the public
106
+ by any means or process and the communication to the public of the
107
+ performances of the Work, including by public digital performance; to
108
+ broadcast and rebroadcast the Work by any means including signs,
109
+ sounds or images.
110
+ k. "Reproduce" means to make copies of the Work by any means including
111
+ without limitation by sound or visual recordings and the right of
112
+ fixation and reproducing fixations of the Work, including storage of a
113
+ protected performance or phonogram in digital form or other electronic
114
+ medium.
115
+
116
+ 2. Fair Dealing Rights. Nothing in this License is intended to reduce,
117
+ limit, or restrict any uses free from copyright or rights arising from
118
+ limitations or exceptions that are provided for in connection with the
119
+ copyright protection under copyright law or other applicable laws.
120
+
121
+ 3. License Grant. Subject to the terms and conditions of this License,
122
+ Licensor hereby grants You a worldwide, royalty-free, non-exclusive,
123
+ perpetual (for the duration of the applicable copyright) license to
124
+ exercise the rights in the Work as stated below:
125
+
126
+ a. to Reproduce the Work, to incorporate the Work into one or more
127
+ Collections, and to Reproduce the Work as incorporated in the
128
+ Collections;
129
+ b. to create and Reproduce Adaptations provided that any such Adaptation,
130
+ including any translation in any medium, takes reasonable steps to
131
+ clearly label, demarcate or otherwise identify that changes were made
132
+ to the original Work. For example, a translation could be marked "The
133
+ original work was translated from English to Spanish," or a
134
+ modification could indicate "The original work has been modified.";
135
+ c. to Distribute and Publicly Perform the Work including as incorporated
136
+ in Collections; and,
137
+ d. to Distribute and Publicly Perform Adaptations.
138
+ e. For the avoidance of doubt:
139
+
140
+ i. Non-waivable Compulsory License Schemes. In those jurisdictions in
141
+ which the right to collect royalties through any statutory or
142
+ compulsory licensing scheme cannot be waived, the Licensor
143
+ reserves the exclusive right to collect such royalties for any
144
+ exercise by You of the rights granted under this License;
145
+ ii. Waivable Compulsory License Schemes. In those jurisdictions in
146
+ which the right to collect royalties through any statutory or
147
+ compulsory licensing scheme can be waived, the Licensor waives the
148
+ exclusive right to collect such royalties for any exercise by You
149
+ of the rights granted under this License; and,
150
+ iii. Voluntary License Schemes. The Licensor waives the right to
151
+ collect royalties, whether individually or, in the event that the
152
+ Licensor is a member of a collecting society that administers
153
+ voluntary licensing schemes, via that society, from any exercise
154
+ by You of the rights granted under this License.
155
+
156
+ The above rights may be exercised in all media and formats whether now
157
+ known or hereafter devised. The above rights include the right to make
158
+ such modifications as are technically necessary to exercise the rights in
159
+ other media and formats. Subject to Section 8(f), all rights not expressly
160
+ granted by Licensor are hereby reserved.
161
+
162
+ 4. Restrictions. The license granted in Section 3 above is expressly made
163
+ subject to and limited by the following restrictions:
164
+
165
+ a. You may Distribute or Publicly Perform the Work only under the terms
166
+ of this License. You must include a copy of, or the Uniform Resource
167
+ Identifier (URI) for, this License with every copy of the Work You
168
+ Distribute or Publicly Perform. You may not offer or impose any terms
169
+ on the Work that restrict the terms of this License or the ability of
170
+ the recipient of the Work to exercise the rights granted to that
171
+ recipient under the terms of the License. You may not sublicense the
172
+ Work. You must keep intact all notices that refer to this License and
173
+ to the disclaimer of warranties with every copy of the Work You
174
+ Distribute or Publicly Perform. When You Distribute or Publicly
175
+ Perform the Work, You may not impose any effective technological
176
+ measures on the Work that restrict the ability of a recipient of the
177
+ Work from You to exercise the rights granted to that recipient under
178
+ the terms of the License. This Section 4(a) applies to the Work as
179
+ incorporated in a Collection, but this does not require the Collection
180
+ apart from the Work itself to be made subject to the terms of this
181
+ License. If You create a Collection, upon notice from any Licensor You
182
+ must, to the extent practicable, remove from the Collection any credit
183
+ as required by Section 4(c), as requested. If You create an
184
+ Adaptation, upon notice from any Licensor You must, to the extent
185
+ practicable, remove from the Adaptation any credit as required by
186
+ Section 4(c), as requested.
187
+ b. You may Distribute or Publicly Perform an Adaptation only under the
188
+ terms of: (i) this License; (ii) a later version of this License with
189
+ the same License Elements as this License; (iii) a Creative Commons
190
+ jurisdiction license (either this or a later license version) that
191
+ contains the same License Elements as this License (e.g.,
192
+ Attribution-ShareAlike 3.0 US)); (iv) a Creative Commons Compatible
193
+ License. If you license the Adaptation under one of the licenses
194
+ mentioned in (iv), you must comply with the terms of that license. If
195
+ you license the Adaptation under the terms of any of the licenses
196
+ mentioned in (i), (ii) or (iii) (the "Applicable License"), you must
197
+ comply with the terms of the Applicable License generally and the
198
+ following provisions: (I) You must include a copy of, or the URI for,
199
+ the Applicable License with every copy of each Adaptation You
200
+ Distribute or Publicly Perform; (II) You may not offer or impose any
201
+ terms on the Adaptation that restrict the terms of the Applicable
202
+ License or the ability of the recipient of the Adaptation to exercise
203
+ the rights granted to that recipient under the terms of the Applicable
204
+ License; (III) You must keep intact all notices that refer to the
205
+ Applicable License and to the disclaimer of warranties with every copy
206
+ of the Work as included in the Adaptation You Distribute or Publicly
207
+ Perform; (IV) when You Distribute or Publicly Perform the Adaptation,
208
+ You may not impose any effective technological measures on the
209
+ Adaptation that restrict the ability of a recipient of the Adaptation
210
+ from You to exercise the rights granted to that recipient under the
211
+ terms of the Applicable License. This Section 4(b) applies to the
212
+ Adaptation as incorporated in a Collection, but this does not require
213
+ the Collection apart from the Adaptation itself to be made subject to
214
+ the terms of the Applicable License.
215
+ c. If You Distribute, or Publicly Perform the Work or any Adaptations or
216
+ Collections, You must, unless a request has been made pursuant to
217
+ Section 4(a), keep intact all copyright notices for the Work and
218
+ provide, reasonable to the medium or means You are utilizing: (i) the
219
+ name of the Original Author (or pseudonym, if applicable) if supplied,
220
+ and/or if the Original Author and/or Licensor designate another party
221
+ or parties (e.g., a sponsor institute, publishing entity, journal) for
222
+ attribution ("Attribution Parties") in Licensor's copyright notice,
223
+ terms of service or by other reasonable means, the name of such party
224
+ or parties; (ii) the title of the Work if supplied; (iii) to the
225
+ extent reasonably practicable, the URI, if any, that Licensor
226
+ specifies to be associated with the Work, unless such URI does not
227
+ refer to the copyright notice or licensing information for the Work;
228
+ and (iv) , consistent with Ssection 3(b), in the case of an
229
+ Adaptation, a credit identifying the use of the Work in the Adaptation
230
+ (e.g., "French translation of the Work by Original Author," or
231
+ "Screenplay based on original Work by Original Author"). The credit
232
+ required by this Section 4(c) may be implemented in any reasonable
233
+ manner; provided, however, that in the case of a Adaptation or
234
+ Collection, at a minimum such credit will appear, if a credit for all
235
+ contributing authors of the Adaptation or Collection appears, then as
236
+ part of these credits and in a manner at least as prominent as the
237
+ credits for the other contributing authors. For the avoidance of
238
+ doubt, You may only use the credit required by this Section for the
239
+ purpose of attribution in the manner set out above and, by exercising
240
+ Your rights under this License, You may not implicitly or explicitly
241
+ assert or imply any connection with, sponsorship or endorsement by the
242
+ Original Author, Licensor and/or Attribution Parties, as appropriate,
243
+ of You or Your use of the Work, without the separate, express prior
244
+ written permission of the Original Author, Licensor and/or Attribution
245
+ Parties.
246
+ d. Except as otherwise agreed in writing by the Licensor or as may be
247
+ otherwise permitted by applicable law, if You Reproduce, Distribute or
248
+ Publicly Perform the Work either by itself or as part of any
249
+ Adaptations or Collections, You must not distort, mutilate, modify or
250
+ take other derogatory action in relation to the Work which would be
251
+ prejudicial to the Original Author's honor or reputation. Licensor
252
+ agrees that in those jurisdictions (e.g. Japan), in which any exercise
253
+ of the right granted in Section 3(b) of this License (the right to
254
+ make Adaptations) would be deemed to be a distortion, mutilation,
255
+ modification or other derogatory action prejudicial to the Original
256
+ Author's honor and reputation, the Licensor will waive or not assert,
257
+ as appropriate, this Section, to the fullest extent permitted by the
258
+ applicable national law, to enable You to reasonably exercise Your
259
+ right under Section 3(b) of this License (right to make Adaptations)
260
+ but not otherwise.
261
+
262
+ 5. Representations, Warranties and Disclaimer
263
+
264
+ UNLESS OTHERWISE MUTUALLY AGREED TO BY THE PARTIES IN WRITING, LICENSOR
265
+ OFFERS THE WORK AS-IS AND MAKES NO REPRESENTATIONS OR WARRANTIES OF ANY
266
+ KIND CONCERNING THE WORK, EXPRESS, IMPLIED, STATUTORY OR OTHERWISE,
267
+ INCLUDING, WITHOUT LIMITATION, WARRANTIES OF TITLE, MERCHANTIBILITY,
268
+ FITNESS FOR A PARTICULAR PURPOSE, NONINFRINGEMENT, OR THE ABSENCE OF
269
+ LATENT OR OTHER DEFECTS, ACCURACY, OR THE PRESENCE OF ABSENCE OF ERRORS,
270
+ WHETHER OR NOT DISCOVERABLE. SOME JURISDICTIONS DO NOT ALLOW THE EXCLUSION
271
+ OF IMPLIED WARRANTIES, SO SUCH EXCLUSION MAY NOT APPLY TO YOU.
272
+
273
+ 6. Limitation on Liability. EXCEPT TO THE EXTENT REQUIRED BY APPLICABLE
274
+ LAW, IN NO EVENT WILL LICENSOR BE LIABLE TO YOU ON ANY LEGAL THEORY FOR
275
+ ANY SPECIAL, INCIDENTAL, CONSEQUENTIAL, PUNITIVE OR EXEMPLARY DAMAGES
276
+ ARISING OUT OF THIS LICENSE OR THE USE OF THE WORK, EVEN IF LICENSOR HAS
277
+ BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.
278
+
279
+ 7. Termination
280
+
281
+ a. This License and the rights granted hereunder will terminate
282
+ automatically upon any breach by You of the terms of this License.
283
+ Individuals or entities who have received Adaptations or Collections
284
+ from You under this License, however, will not have their licenses
285
+ terminated provided such individuals or entities remain in full
286
+ compliance with those licenses. Sections 1, 2, 5, 6, 7, and 8 will
287
+ survive any termination of this License.
288
+ b. Subject to the above terms and conditions, the license granted here is
289
+ perpetual (for the duration of the applicable copyright in the Work).
290
+ Notwithstanding the above, Licensor reserves the right to release the
291
+ Work under different license terms or to stop distributing the Work at
292
+ any time; provided, however that any such election will not serve to
293
+ withdraw this License (or any other license that has been, or is
294
+ required to be, granted under the terms of this License), and this
295
+ License will continue in full force and effect unless terminated as
296
+ stated above.
297
+
298
+ 8. Miscellaneous
299
+
300
+ a. Each time You Distribute or Publicly Perform the Work or a Collection,
301
+ the Licensor offers to the recipient a license to the Work on the same
302
+ terms and conditions as the license granted to You under this License.
303
+ b. Each time You Distribute or Publicly Perform an Adaptation, Licensor
304
+ offers to the recipient a license to the original Work on the same
305
+ terms and conditions as the license granted to You under this License.
306
+ c. If any provision of this License is invalid or unenforceable under
307
+ applicable law, it shall not affect the validity or enforceability of
308
+ the remainder of the terms of this License, and without further action
309
+ by the parties to this agreement, such provision shall be reformed to
310
+ the minimum extent necessary to make such provision valid and
311
+ enforceable.
312
+ d. No term or provision of this License shall be deemed waived and no
313
+ breach consented to unless such waiver or consent shall be in writing
314
+ and signed by the party to be charged with such waiver or consent.
315
+ e. This License constitutes the entire agreement between the parties with
316
+ respect to the Work licensed here. There are no understandings,
317
+ agreements or representations with respect to the Work not specified
318
+ here. Licensor shall not be bound by any additional provisions that
319
+ may appear in any communication from You. This License may not be
320
+ modified without the mutual written agreement of the Licensor and You.
321
+ f. The rights granted under, and the subject matter referenced, in this
322
+ License were drafted utilizing the terminology of the Berne Convention
323
+ for the Protection of Literary and Artistic Works (as amended on
324
+ September 28, 1979), the Rome Convention of 1961, the WIPO Copyright
325
+ Treaty of 1996, the WIPO Performances and Phonograms Treaty of 1996
326
+ and the Universal Copyright Convention (as revised on July 24, 1971).
327
+ These rights and subject matter take effect in the relevant
328
+ jurisdiction in which the License terms are sought to be enforced
329
+ according to the corresponding provisions of the implementation of
330
+ those treaty provisions in the applicable national law. If the
331
+ standard suite of rights granted under applicable copyright law
332
+ includes additional rights not granted under this License, such
333
+ additional rights are deemed to be included in the License; this
334
+ License is not intended to restrict the license of any rights under
335
+ applicable law.
336
+
337
+
338
+ Creative Commons Notice
339
+
340
+ Creative Commons is not a party to this License, and makes no warranty
341
+ whatsoever in connection with the Work. Creative Commons will not be
342
+ liable to You or any party on any legal theory for any damages
343
+ whatsoever, including without limitation any general, special,
344
+ incidental or consequential damages arising in connection to this
345
+ license. Notwithstanding the foregoing two (2) sentences, if Creative
346
+ Commons has expressly identified itself as the Licensor hereunder, it
347
+ shall have all rights and obligations of Licensor.
348
+
349
+ Except for the limited purpose of indicating to the public that the
350
+ Work is licensed under the CCPL, Creative Commons does not authorize
351
+ the use by either party of the trademark "Creative Commons" or any
352
+ related trademark or logo of Creative Commons without the prior
353
+ written consent of Creative Commons. Any permitted use will be in
354
+ compliance with Creative Commons' then-current trademark usage
355
+ guidelines, as may be published on its website or otherwise made
356
+ available upon request from time to time. For the avoidance of doubt,
357
+ this trademark restriction does not form part of the License.
358
+
359
+ Creative Commons may be contacted at https://creativecommons.org/.
README.md CHANGED
@@ -1,3 +1,96 @@
1
  ---
2
- license: cc-by-3.0
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ license: cc-by-sa-3.0
3
+ language: ja
4
+ inference: false
5
  ---
6
+
7
+ # LayoutLM-wikipedia-ja Model
8
+
9
+ This is a [LayoutLM](https://doi.org/10.1145/3394486.3403172) model pretrained on texts in the Japanese language.
10
+
11
+ ## Model Details
12
+
13
+ ### Model Description
14
+
15
+ - **Developed by:** Advanced Technology Laboratory, The Japan Research Institute, Limited.
16
+ - **Model type:** LayoutLM
17
+ - **Language:** Japanese
18
+ - **License:** [CC BY-SA 3.0](https://creativecommons.org/licenses/by-sa/3.0/)
19
+ - **Finetuned from model:** [cl-tohoku/bert-base-japanese-v2](https://huggingface.co/cl-tohoku/bert-base-japanese-v2)
20
+
21
+ ## Uses
22
+
23
+ The model is primarily aimed at being fine-tuned on a token classification task. You can use the raw model for masked language modeling, although it is not the primary use case. Refer to <https://github.com/nishiwakikazutaka/shinra2022-task2_jrird> for instructions on how to fine-tune the model. Note that the linked repository is written in Japanese.
24
+
25
+ ## How to Get Started with the Model
26
+
27
+ Use the code below to get started with the model.
28
+
29
+ ```python
30
+ >>> from transformers import AutoTokenizer, AutoModel
31
+ >>> import torch
32
+
33
+ >>> tokenizer = AutoTokenizer.from_pretrained("jri-advtechlab/layoutlm-wikipedia-ja")
34
+ >>> model = AutoModel.from_pretrained("jri-advtechlab/layoutlm-wikipedia-ja")
35
+
36
+ >>> tokens = tokenizer.tokenize("こんにちは") # ['こん', '##にち', '##は']
37
+ >>> normalized_token_boxes = [[637, 773, 693, 782], [693, 773, 749, 782], [749, 773, 775, 782]]
38
+ >>> # add bounding boxes of cls + sep tokens
39
+ >>> bbox = [[0, 0, 0, 0]] + normalized_token_boxes + [[1000, 1000, 1000, 1000]]
40
+
41
+ >>> input_ids = [tokenizer.cls_token_id] \
42
+ + tokenizer.convert_tokens_to_ids(tokens) \
43
+ + [tokenizer.sep_token_id]
44
+ >>> attention_mask = [1] * len(input_ids)
45
+ >>> token_type_ids = [0] * len(input_ids)
46
+ >>> encoding = {
47
+ "input_ids": torch.tensor([input_ids]),
48
+ "attention_mask": torch.tensor([attention_mask]),
49
+ "token_type_ids": torch.tensor([token_type_ids]),
50
+ "bbox": torch.tensor([bbox]),
51
+ }
52
+
53
+ >>> outputs = model(**encoding)
54
+ ```
55
+
56
+ ## Training Details
57
+
58
+ ### Training Data
59
+
60
+ The model is trained on the Japanese version of Wikipedia. The training corpus is distributed as [training data of the SHINRA 2022 shared task](https://2022.shinra-project.info/data-download#subtask-common).
61
+
62
+ ### Tokenization and Localization
63
+
64
+ We used the tokenizer of [cl-tohoku/bert-base-japanese-v2](https://huggingface.co/cl-tohoku/bert-base-japanese-v2) to split texts into tokens (subwords). Each token is wrapped in a `<span>` tag with the no-wrap value set for the white-space property and localized by obtaining `BoundingClientRect`. The localization process was conducted with Google Chrome (106.0.5249.119) headless mode on Ubuntu 20.04.5 LTS with a 1,280*854 window size.
65
+
66
+ The vocabulary is the same as [cl-tohoku/bert-base-japanese-v2](https://huggingface.co/cl-tohoku/bert-base-japanese-v2).
67
+
68
+ ### Training Procedure
69
+
70
+ The model was trained using Masked Visual-Language Model (MVLM), but it was not trained using Multi-label Document Classification (MDC). We made this decision because we did not identify significant visual differences, such as those between a contract and an invoice, between the different Wikipedia articles.
71
+
72
+ #### Preprocessing
73
+
74
+ All parameters except the 2-D Position Embedding were initialized with weights from [cl-tohoku/bert-base-japanese-v2](https://huggingface.co/cl-tohoku/bert-base-japanese-v2). We initialized the 2-D Position Embedding with random values.
75
+
76
+ #### Training Hyperparameters
77
+
78
+ The model was trained on 8 NVIDIA A100 SXM4 GPUs for 100,000 steps, with a batch size of 256 with a maximum sequence length of 512. The optimizer used is Adam with a learning rate of 5e-5, &beta;<sub>1</sub>=0.9, &beta;<sub>2</sub>=0.999, learning rate warmup for 1,000 steps, and linear decay of the learning rate after. Additionally, we utilized fp16 mixed precision during training. The training took about 5.3 hours to finish.
79
+
80
+ ## Evaluation
81
+
82
+ Our fine-tuned model achieved a macro-f1 score of 55.1451 on the leaderboard for the SHINRA 2022 shared task. You can check the leaderboard at [https://2022.shinra-project.info/#leaderboard](https://2022.shinra-project.info/#leaderboard) for detailed information.
83
+
84
+ ## Citation
85
+
86
+ **BibTeX:**
87
+
88
+ ```tex
89
+ @inproceedings{nishiwaki2023layoutlm-wiki-ja,
90
+ title = {日本語情報抽出タスクのための{L}ayout{LM}モデルの評価},
91
+ author = {西脇一尊 and 大沼俊輔 and 門脇一真},
92
+ booktitle = {言語処理学会第29回年次大会(NLP2023)予稿集},
93
+ year = {2023},
94
+ pages = {522--527}
95
+ }
96
+ ```
config.json ADDED
@@ -0,0 +1,27 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "layoutlm-wikipedia-ja",
3
+ "architectures": [
4
+ "LayoutLMForMaskedLM"
5
+ ],
6
+ "attention_probs_dropout_prob": 0.1,
7
+ "classifier_dropout": null,
8
+ "hidden_act": "gelu",
9
+ "hidden_dropout_prob": 0.1,
10
+ "hidden_size": 768,
11
+ "initializer_range": 0.02,
12
+ "intermediate_size": 3072,
13
+ "layer_norm_eps": 1e-12,
14
+ "max_2d_position_embeddings": 1024,
15
+ "max_position_embeddings": 512,
16
+ "model_type": "layoutlm",
17
+ "num_attention_heads": 12,
18
+ "num_hidden_layers": 12,
19
+ "output_past": true,
20
+ "pad_token_id": 0,
21
+ "position_embedding_type": "absolute",
22
+ "torch_dtype": "float32",
23
+ "transformers_version": "4.20.1",
24
+ "type_vocab_size": 2,
25
+ "use_cache": true,
26
+ "vocab_size": 32768
27
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:97129d713303ec4884d3460a94f846f4406e2f31f99aafbcb70d7ac53b4221a6
3
+ size 457434408
pytorch_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7c44bfc381a7255c9f2922f3aa12c19d0da26f32aa25924c94a3d74f06353f60
3
+ size 457481722
special_tokens_map.json ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ {
2
+ "cls_token": "[CLS]",
3
+ "mask_token": "[MASK]",
4
+ "pad_token": "[PAD]",
5
+ "sep_token": "[SEP]",
6
+ "unk_token": "[UNK]"
7
+ }
tokenizer_config.json ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cls_token": "[CLS]",
3
+ "do_basic_tokenize": true,
4
+ "do_lower_case": false,
5
+ "mask_token": "[MASK]",
6
+ "name_or_path": "layoutlm-wikipedia-ja",
7
+ "never_split": null,
8
+ "pad_token": "[PAD]",
9
+ "sep_token": "[SEP]",
10
+ "strip_accents": false,
11
+ "tokenize_chinese_chars": false,
12
+ "tokenizer_class": "LayoutLMTokenizer",
13
+ "unk_token": "[UNK]"
14
+ }
vocab.txt ADDED
The diff for this file is too large to render. See raw diff