nishiwakikazutaka
commited on
Commit
•
37ce69f
1
Parent(s):
8c3d141
Initial release.
Browse files- LICENSE +359 -0
- README.md +94 -1
- config.json +27 -0
- model.safetensors +3 -0
- pytorch_model.bin +3 -0
- special_tokens_map.json +7 -0
- tokenizer_config.json +14 -0
- vocab.txt +0 -0
LICENSE
ADDED
@@ -0,0 +1,359 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
Creative Commons Legal Code
|
2 |
+
|
3 |
+
Attribution-ShareAlike 3.0 Unported
|
4 |
+
|
5 |
+
CREATIVE COMMONS CORPORATION IS NOT A LAW FIRM AND DOES NOT PROVIDE
|
6 |
+
LEGAL SERVICES. DISTRIBUTION OF THIS LICENSE DOES NOT CREATE AN
|
7 |
+
ATTORNEY-CLIENT RELATIONSHIP. CREATIVE COMMONS PROVIDES THIS
|
8 |
+
INFORMATION ON AN "AS-IS" BASIS. CREATIVE COMMONS MAKES NO WARRANTIES
|
9 |
+
REGARDING THE INFORMATION PROVIDED, AND DISCLAIMS LIABILITY FOR
|
10 |
+
DAMAGES RESULTING FROM ITS USE.
|
11 |
+
|
12 |
+
License
|
13 |
+
|
14 |
+
THE WORK (AS DEFINED BELOW) IS PROVIDED UNDER THE TERMS OF THIS CREATIVE
|
15 |
+
COMMONS PUBLIC LICENSE ("CCPL" OR "LICENSE"). THE WORK IS PROTECTED BY
|
16 |
+
COPYRIGHT AND/OR OTHER APPLICABLE LAW. ANY USE OF THE WORK OTHER THAN AS
|
17 |
+
AUTHORIZED UNDER THIS LICENSE OR COPYRIGHT LAW IS PROHIBITED.
|
18 |
+
|
19 |
+
BY EXERCISING ANY RIGHTS TO THE WORK PROVIDED HERE, YOU ACCEPT AND AGREE
|
20 |
+
TO BE BOUND BY THE TERMS OF THIS LICENSE. TO THE EXTENT THIS LICENSE MAY
|
21 |
+
BE CONSIDERED TO BE A CONTRACT, THE LICENSOR GRANTS YOU THE RIGHTS
|
22 |
+
CONTAINED HERE IN CONSIDERATION OF YOUR ACCEPTANCE OF SUCH TERMS AND
|
23 |
+
CONDITIONS.
|
24 |
+
|
25 |
+
1. Definitions
|
26 |
+
|
27 |
+
a. "Adaptation" means a work based upon the Work, or upon the Work and
|
28 |
+
other pre-existing works, such as a translation, adaptation,
|
29 |
+
derivative work, arrangement of music or other alterations of a
|
30 |
+
literary or artistic work, or phonogram or performance and includes
|
31 |
+
cinematographic adaptations or any other form in which the Work may be
|
32 |
+
recast, transformed, or adapted including in any form recognizably
|
33 |
+
derived from the original, except that a work that constitutes a
|
34 |
+
Collection will not be considered an Adaptation for the purpose of
|
35 |
+
this License. For the avoidance of doubt, where the Work is a musical
|
36 |
+
work, performance or phonogram, the synchronization of the Work in
|
37 |
+
timed-relation with a moving image ("synching") will be considered an
|
38 |
+
Adaptation for the purpose of this License.
|
39 |
+
b. "Collection" means a collection of literary or artistic works, such as
|
40 |
+
encyclopedias and anthologies, or performances, phonograms or
|
41 |
+
broadcasts, or other works or subject matter other than works listed
|
42 |
+
in Section 1(f) below, which, by reason of the selection and
|
43 |
+
arrangement of their contents, constitute intellectual creations, in
|
44 |
+
which the Work is included in its entirety in unmodified form along
|
45 |
+
with one or more other contributions, each constituting separate and
|
46 |
+
independent works in themselves, which together are assembled into a
|
47 |
+
collective whole. A work that constitutes a Collection will not be
|
48 |
+
considered an Adaptation (as defined below) for the purposes of this
|
49 |
+
License.
|
50 |
+
c. "Creative Commons Compatible License" means a license that is listed
|
51 |
+
at https://creativecommons.org/compatiblelicenses that has been
|
52 |
+
approved by Creative Commons as being essentially equivalent to this
|
53 |
+
License, including, at a minimum, because that license: (i) contains
|
54 |
+
terms that have the same purpose, meaning and effect as the License
|
55 |
+
Elements of this License; and, (ii) explicitly permits the relicensing
|
56 |
+
of adaptations of works made available under that license under this
|
57 |
+
License or a Creative Commons jurisdiction license with the same
|
58 |
+
License Elements as this License.
|
59 |
+
d. "Distribute" means to make available to the public the original and
|
60 |
+
copies of the Work or Adaptation, as appropriate, through sale or
|
61 |
+
other transfer of ownership.
|
62 |
+
e. "License Elements" means the following high-level license attributes
|
63 |
+
as selected by Licensor and indicated in the title of this License:
|
64 |
+
Attribution, ShareAlike.
|
65 |
+
f. "Licensor" means the individual, individuals, entity or entities that
|
66 |
+
offer(s) the Work under the terms of this License.
|
67 |
+
g. "Original Author" means, in the case of a literary or artistic work,
|
68 |
+
the individual, individuals, entity or entities who created the Work
|
69 |
+
or if no individual or entity can be identified, the publisher; and in
|
70 |
+
addition (i) in the case of a performance the actors, singers,
|
71 |
+
musicians, dancers, and other persons who act, sing, deliver, declaim,
|
72 |
+
play in, interpret or otherwise perform literary or artistic works or
|
73 |
+
expressions of folklore; (ii) in the case of a phonogram the producer
|
74 |
+
being the person or legal entity who first fixes the sounds of a
|
75 |
+
performance or other sounds; and, (iii) in the case of broadcasts, the
|
76 |
+
organization that transmits the broadcast.
|
77 |
+
h. "Work" means the literary and/or artistic work offered under the terms
|
78 |
+
of this License including without limitation any production in the
|
79 |
+
literary, scientific and artistic domain, whatever may be the mode or
|
80 |
+
form of its expression including digital form, such as a book,
|
81 |
+
pamphlet and other writing; a lecture, address, sermon or other work
|
82 |
+
of the same nature; a dramatic or dramatico-musical work; a
|
83 |
+
choreographic work or entertainment in dumb show; a musical
|
84 |
+
composition with or without words; a cinematographic work to which are
|
85 |
+
assimilated works expressed by a process analogous to cinematography;
|
86 |
+
a work of drawing, painting, architecture, sculpture, engraving or
|
87 |
+
lithography; a photographic work to which are assimilated works
|
88 |
+
expressed by a process analogous to photography; a work of applied
|
89 |
+
art; an illustration, map, plan, sketch or three-dimensional work
|
90 |
+
relative to geography, topography, architecture or science; a
|
91 |
+
performance; a broadcast; a phonogram; a compilation of data to the
|
92 |
+
extent it is protected as a copyrightable work; or a work performed by
|
93 |
+
a variety or circus performer to the extent it is not otherwise
|
94 |
+
considered a literary or artistic work.
|
95 |
+
i. "You" means an individual or entity exercising rights under this
|
96 |
+
License who has not previously violated the terms of this License with
|
97 |
+
respect to the Work, or who has received express permission from the
|
98 |
+
Licensor to exercise rights under this License despite a previous
|
99 |
+
violation.
|
100 |
+
j. "Publicly Perform" means to perform public recitations of the Work and
|
101 |
+
to communicate to the public those public recitations, by any means or
|
102 |
+
process, including by wire or wireless means or public digital
|
103 |
+
performances; to make available to the public Works in such a way that
|
104 |
+
members of the public may access these Works from a place and at a
|
105 |
+
place individually chosen by them; to perform the Work to the public
|
106 |
+
by any means or process and the communication to the public of the
|
107 |
+
performances of the Work, including by public digital performance; to
|
108 |
+
broadcast and rebroadcast the Work by any means including signs,
|
109 |
+
sounds or images.
|
110 |
+
k. "Reproduce" means to make copies of the Work by any means including
|
111 |
+
without limitation by sound or visual recordings and the right of
|
112 |
+
fixation and reproducing fixations of the Work, including storage of a
|
113 |
+
protected performance or phonogram in digital form or other electronic
|
114 |
+
medium.
|
115 |
+
|
116 |
+
2. Fair Dealing Rights. Nothing in this License is intended to reduce,
|
117 |
+
limit, or restrict any uses free from copyright or rights arising from
|
118 |
+
limitations or exceptions that are provided for in connection with the
|
119 |
+
copyright protection under copyright law or other applicable laws.
|
120 |
+
|
121 |
+
3. License Grant. Subject to the terms and conditions of this License,
|
122 |
+
Licensor hereby grants You a worldwide, royalty-free, non-exclusive,
|
123 |
+
perpetual (for the duration of the applicable copyright) license to
|
124 |
+
exercise the rights in the Work as stated below:
|
125 |
+
|
126 |
+
a. to Reproduce the Work, to incorporate the Work into one or more
|
127 |
+
Collections, and to Reproduce the Work as incorporated in the
|
128 |
+
Collections;
|
129 |
+
b. to create and Reproduce Adaptations provided that any such Adaptation,
|
130 |
+
including any translation in any medium, takes reasonable steps to
|
131 |
+
clearly label, demarcate or otherwise identify that changes were made
|
132 |
+
to the original Work. For example, a translation could be marked "The
|
133 |
+
original work was translated from English to Spanish," or a
|
134 |
+
modification could indicate "The original work has been modified.";
|
135 |
+
c. to Distribute and Publicly Perform the Work including as incorporated
|
136 |
+
in Collections; and,
|
137 |
+
d. to Distribute and Publicly Perform Adaptations.
|
138 |
+
e. For the avoidance of doubt:
|
139 |
+
|
140 |
+
i. Non-waivable Compulsory License Schemes. In those jurisdictions in
|
141 |
+
which the right to collect royalties through any statutory or
|
142 |
+
compulsory licensing scheme cannot be waived, the Licensor
|
143 |
+
reserves the exclusive right to collect such royalties for any
|
144 |
+
exercise by You of the rights granted under this License;
|
145 |
+
ii. Waivable Compulsory License Schemes. In those jurisdictions in
|
146 |
+
which the right to collect royalties through any statutory or
|
147 |
+
compulsory licensing scheme can be waived, the Licensor waives the
|
148 |
+
exclusive right to collect such royalties for any exercise by You
|
149 |
+
of the rights granted under this License; and,
|
150 |
+
iii. Voluntary License Schemes. The Licensor waives the right to
|
151 |
+
collect royalties, whether individually or, in the event that the
|
152 |
+
Licensor is a member of a collecting society that administers
|
153 |
+
voluntary licensing schemes, via that society, from any exercise
|
154 |
+
by You of the rights granted under this License.
|
155 |
+
|
156 |
+
The above rights may be exercised in all media and formats whether now
|
157 |
+
known or hereafter devised. The above rights include the right to make
|
158 |
+
such modifications as are technically necessary to exercise the rights in
|
159 |
+
other media and formats. Subject to Section 8(f), all rights not expressly
|
160 |
+
granted by Licensor are hereby reserved.
|
161 |
+
|
162 |
+
4. Restrictions. The license granted in Section 3 above is expressly made
|
163 |
+
subject to and limited by the following restrictions:
|
164 |
+
|
165 |
+
a. You may Distribute or Publicly Perform the Work only under the terms
|
166 |
+
of this License. You must include a copy of, or the Uniform Resource
|
167 |
+
Identifier (URI) for, this License with every copy of the Work You
|
168 |
+
Distribute or Publicly Perform. You may not offer or impose any terms
|
169 |
+
on the Work that restrict the terms of this License or the ability of
|
170 |
+
the recipient of the Work to exercise the rights granted to that
|
171 |
+
recipient under the terms of the License. You may not sublicense the
|
172 |
+
Work. You must keep intact all notices that refer to this License and
|
173 |
+
to the disclaimer of warranties with every copy of the Work You
|
174 |
+
Distribute or Publicly Perform. When You Distribute or Publicly
|
175 |
+
Perform the Work, You may not impose any effective technological
|
176 |
+
measures on the Work that restrict the ability of a recipient of the
|
177 |
+
Work from You to exercise the rights granted to that recipient under
|
178 |
+
the terms of the License. This Section 4(a) applies to the Work as
|
179 |
+
incorporated in a Collection, but this does not require the Collection
|
180 |
+
apart from the Work itself to be made subject to the terms of this
|
181 |
+
License. If You create a Collection, upon notice from any Licensor You
|
182 |
+
must, to the extent practicable, remove from the Collection any credit
|
183 |
+
as required by Section 4(c), as requested. If You create an
|
184 |
+
Adaptation, upon notice from any Licensor You must, to the extent
|
185 |
+
practicable, remove from the Adaptation any credit as required by
|
186 |
+
Section 4(c), as requested.
|
187 |
+
b. You may Distribute or Publicly Perform an Adaptation only under the
|
188 |
+
terms of: (i) this License; (ii) a later version of this License with
|
189 |
+
the same License Elements as this License; (iii) a Creative Commons
|
190 |
+
jurisdiction license (either this or a later license version) that
|
191 |
+
contains the same License Elements as this License (e.g.,
|
192 |
+
Attribution-ShareAlike 3.0 US)); (iv) a Creative Commons Compatible
|
193 |
+
License. If you license the Adaptation under one of the licenses
|
194 |
+
mentioned in (iv), you must comply with the terms of that license. If
|
195 |
+
you license the Adaptation under the terms of any of the licenses
|
196 |
+
mentioned in (i), (ii) or (iii) (the "Applicable License"), you must
|
197 |
+
comply with the terms of the Applicable License generally and the
|
198 |
+
following provisions: (I) You must include a copy of, or the URI for,
|
199 |
+
the Applicable License with every copy of each Adaptation You
|
200 |
+
Distribute or Publicly Perform; (II) You may not offer or impose any
|
201 |
+
terms on the Adaptation that restrict the terms of the Applicable
|
202 |
+
License or the ability of the recipient of the Adaptation to exercise
|
203 |
+
the rights granted to that recipient under the terms of the Applicable
|
204 |
+
License; (III) You must keep intact all notices that refer to the
|
205 |
+
Applicable License and to the disclaimer of warranties with every copy
|
206 |
+
of the Work as included in the Adaptation You Distribute or Publicly
|
207 |
+
Perform; (IV) when You Distribute or Publicly Perform the Adaptation,
|
208 |
+
You may not impose any effective technological measures on the
|
209 |
+
Adaptation that restrict the ability of a recipient of the Adaptation
|
210 |
+
from You to exercise the rights granted to that recipient under the
|
211 |
+
terms of the Applicable License. This Section 4(b) applies to the
|
212 |
+
Adaptation as incorporated in a Collection, but this does not require
|
213 |
+
the Collection apart from the Adaptation itself to be made subject to
|
214 |
+
the terms of the Applicable License.
|
215 |
+
c. If You Distribute, or Publicly Perform the Work or any Adaptations or
|
216 |
+
Collections, You must, unless a request has been made pursuant to
|
217 |
+
Section 4(a), keep intact all copyright notices for the Work and
|
218 |
+
provide, reasonable to the medium or means You are utilizing: (i) the
|
219 |
+
name of the Original Author (or pseudonym, if applicable) if supplied,
|
220 |
+
and/or if the Original Author and/or Licensor designate another party
|
221 |
+
or parties (e.g., a sponsor institute, publishing entity, journal) for
|
222 |
+
attribution ("Attribution Parties") in Licensor's copyright notice,
|
223 |
+
terms of service or by other reasonable means, the name of such party
|
224 |
+
or parties; (ii) the title of the Work if supplied; (iii) to the
|
225 |
+
extent reasonably practicable, the URI, if any, that Licensor
|
226 |
+
specifies to be associated with the Work, unless such URI does not
|
227 |
+
refer to the copyright notice or licensing information for the Work;
|
228 |
+
and (iv) , consistent with Ssection 3(b), in the case of an
|
229 |
+
Adaptation, a credit identifying the use of the Work in the Adaptation
|
230 |
+
(e.g., "French translation of the Work by Original Author," or
|
231 |
+
"Screenplay based on original Work by Original Author"). The credit
|
232 |
+
required by this Section 4(c) may be implemented in any reasonable
|
233 |
+
manner; provided, however, that in the case of a Adaptation or
|
234 |
+
Collection, at a minimum such credit will appear, if a credit for all
|
235 |
+
contributing authors of the Adaptation or Collection appears, then as
|
236 |
+
part of these credits and in a manner at least as prominent as the
|
237 |
+
credits for the other contributing authors. For the avoidance of
|
238 |
+
doubt, You may only use the credit required by this Section for the
|
239 |
+
purpose of attribution in the manner set out above and, by exercising
|
240 |
+
Your rights under this License, You may not implicitly or explicitly
|
241 |
+
assert or imply any connection with, sponsorship or endorsement by the
|
242 |
+
Original Author, Licensor and/or Attribution Parties, as appropriate,
|
243 |
+
of You or Your use of the Work, without the separate, express prior
|
244 |
+
written permission of the Original Author, Licensor and/or Attribution
|
245 |
+
Parties.
|
246 |
+
d. Except as otherwise agreed in writing by the Licensor or as may be
|
247 |
+
otherwise permitted by applicable law, if You Reproduce, Distribute or
|
248 |
+
Publicly Perform the Work either by itself or as part of any
|
249 |
+
Adaptations or Collections, You must not distort, mutilate, modify or
|
250 |
+
take other derogatory action in relation to the Work which would be
|
251 |
+
prejudicial to the Original Author's honor or reputation. Licensor
|
252 |
+
agrees that in those jurisdictions (e.g. Japan), in which any exercise
|
253 |
+
of the right granted in Section 3(b) of this License (the right to
|
254 |
+
make Adaptations) would be deemed to be a distortion, mutilation,
|
255 |
+
modification or other derogatory action prejudicial to the Original
|
256 |
+
Author's honor and reputation, the Licensor will waive or not assert,
|
257 |
+
as appropriate, this Section, to the fullest extent permitted by the
|
258 |
+
applicable national law, to enable You to reasonably exercise Your
|
259 |
+
right under Section 3(b) of this License (right to make Adaptations)
|
260 |
+
but not otherwise.
|
261 |
+
|
262 |
+
5. Representations, Warranties and Disclaimer
|
263 |
+
|
264 |
+
UNLESS OTHERWISE MUTUALLY AGREED TO BY THE PARTIES IN WRITING, LICENSOR
|
265 |
+
OFFERS THE WORK AS-IS AND MAKES NO REPRESENTATIONS OR WARRANTIES OF ANY
|
266 |
+
KIND CONCERNING THE WORK, EXPRESS, IMPLIED, STATUTORY OR OTHERWISE,
|
267 |
+
INCLUDING, WITHOUT LIMITATION, WARRANTIES OF TITLE, MERCHANTIBILITY,
|
268 |
+
FITNESS FOR A PARTICULAR PURPOSE, NONINFRINGEMENT, OR THE ABSENCE OF
|
269 |
+
LATENT OR OTHER DEFECTS, ACCURACY, OR THE PRESENCE OF ABSENCE OF ERRORS,
|
270 |
+
WHETHER OR NOT DISCOVERABLE. SOME JURISDICTIONS DO NOT ALLOW THE EXCLUSION
|
271 |
+
OF IMPLIED WARRANTIES, SO SUCH EXCLUSION MAY NOT APPLY TO YOU.
|
272 |
+
|
273 |
+
6. Limitation on Liability. EXCEPT TO THE EXTENT REQUIRED BY APPLICABLE
|
274 |
+
LAW, IN NO EVENT WILL LICENSOR BE LIABLE TO YOU ON ANY LEGAL THEORY FOR
|
275 |
+
ANY SPECIAL, INCIDENTAL, CONSEQUENTIAL, PUNITIVE OR EXEMPLARY DAMAGES
|
276 |
+
ARISING OUT OF THIS LICENSE OR THE USE OF THE WORK, EVEN IF LICENSOR HAS
|
277 |
+
BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.
|
278 |
+
|
279 |
+
7. Termination
|
280 |
+
|
281 |
+
a. This License and the rights granted hereunder will terminate
|
282 |
+
automatically upon any breach by You of the terms of this License.
|
283 |
+
Individuals or entities who have received Adaptations or Collections
|
284 |
+
from You under this License, however, will not have their licenses
|
285 |
+
terminated provided such individuals or entities remain in full
|
286 |
+
compliance with those licenses. Sections 1, 2, 5, 6, 7, and 8 will
|
287 |
+
survive any termination of this License.
|
288 |
+
b. Subject to the above terms and conditions, the license granted here is
|
289 |
+
perpetual (for the duration of the applicable copyright in the Work).
|
290 |
+
Notwithstanding the above, Licensor reserves the right to release the
|
291 |
+
Work under different license terms or to stop distributing the Work at
|
292 |
+
any time; provided, however that any such election will not serve to
|
293 |
+
withdraw this License (or any other license that has been, or is
|
294 |
+
required to be, granted under the terms of this License), and this
|
295 |
+
License will continue in full force and effect unless terminated as
|
296 |
+
stated above.
|
297 |
+
|
298 |
+
8. Miscellaneous
|
299 |
+
|
300 |
+
a. Each time You Distribute or Publicly Perform the Work or a Collection,
|
301 |
+
the Licensor offers to the recipient a license to the Work on the same
|
302 |
+
terms and conditions as the license granted to You under this License.
|
303 |
+
b. Each time You Distribute or Publicly Perform an Adaptation, Licensor
|
304 |
+
offers to the recipient a license to the original Work on the same
|
305 |
+
terms and conditions as the license granted to You under this License.
|
306 |
+
c. If any provision of this License is invalid or unenforceable under
|
307 |
+
applicable law, it shall not affect the validity or enforceability of
|
308 |
+
the remainder of the terms of this License, and without further action
|
309 |
+
by the parties to this agreement, such provision shall be reformed to
|
310 |
+
the minimum extent necessary to make such provision valid and
|
311 |
+
enforceable.
|
312 |
+
d. No term or provision of this License shall be deemed waived and no
|
313 |
+
breach consented to unless such waiver or consent shall be in writing
|
314 |
+
and signed by the party to be charged with such waiver or consent.
|
315 |
+
e. This License constitutes the entire agreement between the parties with
|
316 |
+
respect to the Work licensed here. There are no understandings,
|
317 |
+
agreements or representations with respect to the Work not specified
|
318 |
+
here. Licensor shall not be bound by any additional provisions that
|
319 |
+
may appear in any communication from You. This License may not be
|
320 |
+
modified without the mutual written agreement of the Licensor and You.
|
321 |
+
f. The rights granted under, and the subject matter referenced, in this
|
322 |
+
License were drafted utilizing the terminology of the Berne Convention
|
323 |
+
for the Protection of Literary and Artistic Works (as amended on
|
324 |
+
September 28, 1979), the Rome Convention of 1961, the WIPO Copyright
|
325 |
+
Treaty of 1996, the WIPO Performances and Phonograms Treaty of 1996
|
326 |
+
and the Universal Copyright Convention (as revised on July 24, 1971).
|
327 |
+
These rights and subject matter take effect in the relevant
|
328 |
+
jurisdiction in which the License terms are sought to be enforced
|
329 |
+
according to the corresponding provisions of the implementation of
|
330 |
+
those treaty provisions in the applicable national law. If the
|
331 |
+
standard suite of rights granted under applicable copyright law
|
332 |
+
includes additional rights not granted under this License, such
|
333 |
+
additional rights are deemed to be included in the License; this
|
334 |
+
License is not intended to restrict the license of any rights under
|
335 |
+
applicable law.
|
336 |
+
|
337 |
+
|
338 |
+
Creative Commons Notice
|
339 |
+
|
340 |
+
Creative Commons is not a party to this License, and makes no warranty
|
341 |
+
whatsoever in connection with the Work. Creative Commons will not be
|
342 |
+
liable to You or any party on any legal theory for any damages
|
343 |
+
whatsoever, including without limitation any general, special,
|
344 |
+
incidental or consequential damages arising in connection to this
|
345 |
+
license. Notwithstanding the foregoing two (2) sentences, if Creative
|
346 |
+
Commons has expressly identified itself as the Licensor hereunder, it
|
347 |
+
shall have all rights and obligations of Licensor.
|
348 |
+
|
349 |
+
Except for the limited purpose of indicating to the public that the
|
350 |
+
Work is licensed under the CCPL, Creative Commons does not authorize
|
351 |
+
the use by either party of the trademark "Creative Commons" or any
|
352 |
+
related trademark or logo of Creative Commons without the prior
|
353 |
+
written consent of Creative Commons. Any permitted use will be in
|
354 |
+
compliance with Creative Commons' then-current trademark usage
|
355 |
+
guidelines, as may be published on its website or otherwise made
|
356 |
+
available upon request from time to time. For the avoidance of doubt,
|
357 |
+
this trademark restriction does not form part of the License.
|
358 |
+
|
359 |
+
Creative Commons may be contacted at https://creativecommons.org/.
|
README.md
CHANGED
@@ -1,3 +1,96 @@
|
|
1 |
---
|
2 |
-
license: cc-by-3.0
|
|
|
|
|
3 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
+
license: cc-by-sa-3.0
|
3 |
+
language: ja
|
4 |
+
inference: false
|
5 |
---
|
6 |
+
|
7 |
+
# LayoutLM-wikipedia-ja Model
|
8 |
+
|
9 |
+
This is a [LayoutLM](https://doi.org/10.1145/3394486.3403172) model pretrained on texts in the Japanese language.
|
10 |
+
|
11 |
+
## Model Details
|
12 |
+
|
13 |
+
### Model Description
|
14 |
+
|
15 |
+
- **Developed by:** Advanced Technology Laboratory, The Japan Research Institute, Limited.
|
16 |
+
- **Model type:** LayoutLM
|
17 |
+
- **Language:** Japanese
|
18 |
+
- **License:** [CC BY-SA 3.0](https://creativecommons.org/licenses/by-sa/3.0/)
|
19 |
+
- **Finetuned from model:** [cl-tohoku/bert-base-japanese-v2](https://huggingface.co/cl-tohoku/bert-base-japanese-v2)
|
20 |
+
|
21 |
+
## Uses
|
22 |
+
|
23 |
+
The model is primarily aimed at being fine-tuned on a token classification task. You can use the raw model for masked language modeling, although it is not the primary use case. Refer to <https://github.com/nishiwakikazutaka/shinra2022-task2_jrird> for instructions on how to fine-tune the model. Note that the linked repository is written in Japanese.
|
24 |
+
|
25 |
+
## How to Get Started with the Model
|
26 |
+
|
27 |
+
Use the code below to get started with the model.
|
28 |
+
|
29 |
+
```python
|
30 |
+
>>> from transformers import AutoTokenizer, AutoModel
|
31 |
+
>>> import torch
|
32 |
+
|
33 |
+
>>> tokenizer = AutoTokenizer.from_pretrained("jri-advtechlab/layoutlm-wikipedia-ja")
|
34 |
+
>>> model = AutoModel.from_pretrained("jri-advtechlab/layoutlm-wikipedia-ja")
|
35 |
+
|
36 |
+
>>> tokens = tokenizer.tokenize("こんにちは") # ['こん', '##にち', '##は']
|
37 |
+
>>> normalized_token_boxes = [[637, 773, 693, 782], [693, 773, 749, 782], [749, 773, 775, 782]]
|
38 |
+
>>> # add bounding boxes of cls + sep tokens
|
39 |
+
>>> bbox = [[0, 0, 0, 0]] + normalized_token_boxes + [[1000, 1000, 1000, 1000]]
|
40 |
+
|
41 |
+
>>> input_ids = [tokenizer.cls_token_id] \
|
42 |
+
+ tokenizer.convert_tokens_to_ids(tokens) \
|
43 |
+
+ [tokenizer.sep_token_id]
|
44 |
+
>>> attention_mask = [1] * len(input_ids)
|
45 |
+
>>> token_type_ids = [0] * len(input_ids)
|
46 |
+
>>> encoding = {
|
47 |
+
"input_ids": torch.tensor([input_ids]),
|
48 |
+
"attention_mask": torch.tensor([attention_mask]),
|
49 |
+
"token_type_ids": torch.tensor([token_type_ids]),
|
50 |
+
"bbox": torch.tensor([bbox]),
|
51 |
+
}
|
52 |
+
|
53 |
+
>>> outputs = model(**encoding)
|
54 |
+
```
|
55 |
+
|
56 |
+
## Training Details
|
57 |
+
|
58 |
+
### Training Data
|
59 |
+
|
60 |
+
The model is trained on the Japanese version of Wikipedia. The training corpus is distributed as [training data of the SHINRA 2022 shared task](https://2022.shinra-project.info/data-download#subtask-common).
|
61 |
+
|
62 |
+
### Tokenization and Localization
|
63 |
+
|
64 |
+
We used the tokenizer of [cl-tohoku/bert-base-japanese-v2](https://huggingface.co/cl-tohoku/bert-base-japanese-v2) to split texts into tokens (subwords). Each token is wrapped in a `<span>` tag with the no-wrap value set for the white-space property and localized by obtaining `BoundingClientRect`. The localization process was conducted with Google Chrome (106.0.5249.119) headless mode on Ubuntu 20.04.5 LTS with a 1,280*854 window size.
|
65 |
+
|
66 |
+
The vocabulary is the same as [cl-tohoku/bert-base-japanese-v2](https://huggingface.co/cl-tohoku/bert-base-japanese-v2).
|
67 |
+
|
68 |
+
### Training Procedure
|
69 |
+
|
70 |
+
The model was trained using Masked Visual-Language Model (MVLM), but it was not trained using Multi-label Document Classification (MDC). We made this decision because we did not identify significant visual differences, such as those between a contract and an invoice, between the different Wikipedia articles.
|
71 |
+
|
72 |
+
#### Preprocessing
|
73 |
+
|
74 |
+
All parameters except the 2-D Position Embedding were initialized with weights from [cl-tohoku/bert-base-japanese-v2](https://huggingface.co/cl-tohoku/bert-base-japanese-v2). We initialized the 2-D Position Embedding with random values.
|
75 |
+
|
76 |
+
#### Training Hyperparameters
|
77 |
+
|
78 |
+
The model was trained on 8 NVIDIA A100 SXM4 GPUs for 100,000 steps, with a batch size of 256 with a maximum sequence length of 512. The optimizer used is Adam with a learning rate of 5e-5, β<sub>1</sub>=0.9, β<sub>2</sub>=0.999, learning rate warmup for 1,000 steps, and linear decay of the learning rate after. Additionally, we utilized fp16 mixed precision during training. The training took about 5.3 hours to finish.
|
79 |
+
|
80 |
+
## Evaluation
|
81 |
+
|
82 |
+
Our fine-tuned model achieved a macro-f1 score of 55.1451 on the leaderboard for the SHINRA 2022 shared task. You can check the leaderboard at [https://2022.shinra-project.info/#leaderboard](https://2022.shinra-project.info/#leaderboard) for detailed information.
|
83 |
+
|
84 |
+
## Citation
|
85 |
+
|
86 |
+
**BibTeX:**
|
87 |
+
|
88 |
+
```tex
|
89 |
+
@inproceedings{nishiwaki2023layoutlm-wiki-ja,
|
90 |
+
title = {日本語情報抽出タスクのための{L}ayout{LM}モデルの評価},
|
91 |
+
author = {西脇一尊 and 大沼俊輔 and 門脇一真},
|
92 |
+
booktitle = {言語処理学会第29回年次大会(NLP2023)予稿集},
|
93 |
+
year = {2023},
|
94 |
+
pages = {522--527}
|
95 |
+
}
|
96 |
+
```
|
config.json
ADDED
@@ -0,0 +1,27 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"_name_or_path": "layoutlm-wikipedia-ja",
|
3 |
+
"architectures": [
|
4 |
+
"LayoutLMForMaskedLM"
|
5 |
+
],
|
6 |
+
"attention_probs_dropout_prob": 0.1,
|
7 |
+
"classifier_dropout": null,
|
8 |
+
"hidden_act": "gelu",
|
9 |
+
"hidden_dropout_prob": 0.1,
|
10 |
+
"hidden_size": 768,
|
11 |
+
"initializer_range": 0.02,
|
12 |
+
"intermediate_size": 3072,
|
13 |
+
"layer_norm_eps": 1e-12,
|
14 |
+
"max_2d_position_embeddings": 1024,
|
15 |
+
"max_position_embeddings": 512,
|
16 |
+
"model_type": "layoutlm",
|
17 |
+
"num_attention_heads": 12,
|
18 |
+
"num_hidden_layers": 12,
|
19 |
+
"output_past": true,
|
20 |
+
"pad_token_id": 0,
|
21 |
+
"position_embedding_type": "absolute",
|
22 |
+
"torch_dtype": "float32",
|
23 |
+
"transformers_version": "4.20.1",
|
24 |
+
"type_vocab_size": 2,
|
25 |
+
"use_cache": true,
|
26 |
+
"vocab_size": 32768
|
27 |
+
}
|
model.safetensors
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:97129d713303ec4884d3460a94f846f4406e2f31f99aafbcb70d7ac53b4221a6
|
3 |
+
size 457434408
|
pytorch_model.bin
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:7c44bfc381a7255c9f2922f3aa12c19d0da26f32aa25924c94a3d74f06353f60
|
3 |
+
size 457481722
|
special_tokens_map.json
ADDED
@@ -0,0 +1,7 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"cls_token": "[CLS]",
|
3 |
+
"mask_token": "[MASK]",
|
4 |
+
"pad_token": "[PAD]",
|
5 |
+
"sep_token": "[SEP]",
|
6 |
+
"unk_token": "[UNK]"
|
7 |
+
}
|
tokenizer_config.json
ADDED
@@ -0,0 +1,14 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"cls_token": "[CLS]",
|
3 |
+
"do_basic_tokenize": true,
|
4 |
+
"do_lower_case": false,
|
5 |
+
"mask_token": "[MASK]",
|
6 |
+
"name_or_path": "layoutlm-wikipedia-ja",
|
7 |
+
"never_split": null,
|
8 |
+
"pad_token": "[PAD]",
|
9 |
+
"sep_token": "[SEP]",
|
10 |
+
"strip_accents": false,
|
11 |
+
"tokenize_chinese_chars": false,
|
12 |
+
"tokenizer_class": "LayoutLMTokenizer",
|
13 |
+
"unk_token": "[UNK]"
|
14 |
+
}
|
vocab.txt
ADDED
The diff for this file is too large to render.
See raw diff
|
|