annt commited on
Commit
daeb223
1 Parent(s): 045ef44

Add application file

Browse files
This view is limited to 50 files because it contains too many changes.   See raw diff
Files changed (50) hide show
  1. LICENSE +201 -0
  2. app.py +155 -0
  3. configs/inference_vi_electra_base.yaml +41 -0
  4. configs/train_vi_electra_base.yaml +49 -0
  5. outputs/intensive/checkpoint-23900/config.json +30 -0
  6. outputs/intensive/checkpoint-23900/pytorch_model.bin +3 -0
  7. outputs/intensive/checkpoint-23900/rng_state.pth +3 -0
  8. outputs/intensive/checkpoint-23900/scaler.pt +3 -0
  9. outputs/intensive/checkpoint-23900/scheduler.pt +3 -0
  10. outputs/intensive/checkpoint-23900/special_tokens_map.json +7 -0
  11. outputs/intensive/checkpoint-23900/tokenizer.json +0 -0
  12. outputs/intensive/checkpoint-23900/tokenizer_config.json +15 -0
  13. outputs/intensive/checkpoint-23900/trainer_state.json +468 -0
  14. outputs/intensive/checkpoint-23900/training_args.bin +3 -0
  15. outputs/intensive/checkpoint-23900/vocab.txt +0 -0
  16. outputs/intensive/nbest_predictions.json +124 -0
  17. outputs/intensive/null_odds.json +3 -0
  18. outputs/intensive/predictions.json +3 -0
  19. outputs/sketch/checkpoint-23900/config.json +31 -0
  20. outputs/sketch/checkpoint-23900/pytorch_model.bin +3 -0
  21. outputs/sketch/checkpoint-23900/rng_state.pth +3 -0
  22. outputs/sketch/checkpoint-23900/scaler.pt +3 -0
  23. outputs/sketch/checkpoint-23900/scheduler.pt +3 -0
  24. outputs/sketch/checkpoint-23900/special_tokens_map.json +7 -0
  25. outputs/sketch/checkpoint-23900/tokenizer.json +0 -0
  26. outputs/sketch/checkpoint-23900/tokenizer_config.json +15 -0
  27. outputs/sketch/checkpoint-23900/trainer_state.json +408 -0
  28. outputs/sketch/checkpoint-23900/training_args.bin +3 -0
  29. outputs/sketch/checkpoint-23900/vocab.txt +0 -0
  30. outputs/sketch/cls_score.json +3 -0
  31. requirements.txt +6 -0
  32. retro_reader/__init__.py +3 -0
  33. retro_reader/__pycache__/__init__.cpython-310.pyc +0 -0
  34. retro_reader/__pycache__/__init__.cpython-37.pyc +0 -0
  35. retro_reader/__pycache__/__init__.cpython-38.pyc +0 -0
  36. retro_reader/__pycache__/base.cpython-310.pyc +0 -0
  37. retro_reader/__pycache__/base.cpython-37.pyc +0 -0
  38. retro_reader/__pycache__/base.cpython-38.pyc +0 -0
  39. retro_reader/__pycache__/constants.cpython-310.pyc +0 -0
  40. retro_reader/__pycache__/constants.cpython-37.pyc +0 -0
  41. retro_reader/__pycache__/constants.cpython-38.pyc +0 -0
  42. retro_reader/__pycache__/metrics.cpython-310.pyc +0 -0
  43. retro_reader/__pycache__/metrics.cpython-37.pyc +0 -0
  44. retro_reader/__pycache__/metrics.cpython-38.pyc +0 -0
  45. retro_reader/__pycache__/preprocess.cpython-310.pyc +0 -0
  46. retro_reader/__pycache__/preprocess.cpython-37.pyc +0 -0
  47. retro_reader/__pycache__/preprocess.cpython-38.pyc +0 -0
  48. retro_reader/__pycache__/retro_reader.cpython-310.pyc +0 -0
  49. retro_reader/__pycache__/retro_reader.cpython-37.pyc +0 -0
  50. retro_reader/__pycache__/retro_reader.cpython-38.pyc +0 -0
LICENSE ADDED
@@ -0,0 +1,201 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Apache License
2
+ Version 2.0, January 2004
3
+ http://www.apache.org/licenses/
4
+
5
+ TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
6
+
7
+ 1. Definitions.
8
+
9
+ "License" shall mean the terms and conditions for use, reproduction,
10
+ and distribution as defined by Sections 1 through 9 of this document.
11
+
12
+ "Licensor" shall mean the copyright owner or entity authorized by
13
+ the copyright owner that is granting the License.
14
+
15
+ "Legal Entity" shall mean the union of the acting entity and all
16
+ other entities that control, are controlled by, or are under common
17
+ control with that entity. For the purposes of this definition,
18
+ "control" means (i) the power, direct or indirect, to cause the
19
+ direction or management of such entity, whether by contract or
20
+ otherwise, or (ii) ownership of fifty percent (50%) or more of the
21
+ outstanding shares, or (iii) beneficial ownership of such entity.
22
+
23
+ "You" (or "Your") shall mean an individual or Legal Entity
24
+ exercising permissions granted by this License.
25
+
26
+ "Source" form shall mean the preferred form for making modifications,
27
+ including but not limited to software source code, documentation
28
+ source, and configuration files.
29
+
30
+ "Object" form shall mean any form resulting from mechanical
31
+ transformation or translation of a Source form, including but
32
+ not limited to compiled object code, generated documentation,
33
+ and conversions to other media types.
34
+
35
+ "Work" shall mean the work of authorship, whether in Source or
36
+ Object form, made available under the License, as indicated by a
37
+ copyright notice that is included in or attached to the work
38
+ (an example is provided in the Appendix below).
39
+
40
+ "Derivative Works" shall mean any work, whether in Source or Object
41
+ form, that is based on (or derived from) the Work and for which the
42
+ editorial revisions, annotations, elaborations, or other modifications
43
+ represent, as a whole, an original work of authorship. For the purposes
44
+ of this License, Derivative Works shall not include works that remain
45
+ separable from, or merely link (or bind by name) to the interfaces of,
46
+ the Work and Derivative Works thereof.
47
+
48
+ "Contribution" shall mean any work of authorship, including
49
+ the original version of the Work and any modifications or additions
50
+ to that Work or Derivative Works thereof, that is intentionally
51
+ submitted to Licensor for inclusion in the Work by the copyright owner
52
+ or by an individual or Legal Entity authorized to submit on behalf of
53
+ the copyright owner. For the purposes of this definition, "submitted"
54
+ means any form of electronic, verbal, or written communication sent
55
+ to the Licensor or its representatives, including but not limited to
56
+ communication on electronic mailing lists, source code control systems,
57
+ and issue tracking systems that are managed by, or on behalf of, the
58
+ Licensor for the purpose of discussing and improving the Work, but
59
+ excluding communication that is conspicuously marked or otherwise
60
+ designated in writing by the copyright owner as "Not a Contribution."
61
+
62
+ "Contributor" shall mean Licensor and any individual or Legal Entity
63
+ on behalf of whom a Contribution has been received by Licensor and
64
+ subsequently incorporated within the Work.
65
+
66
+ 2. Grant of Copyright License. Subject to the terms and conditions of
67
+ this License, each Contributor hereby grants to You a perpetual,
68
+ worldwide, non-exclusive, no-charge, royalty-free, irrevocable
69
+ copyright license to reproduce, prepare Derivative Works of,
70
+ publicly display, publicly perform, sublicense, and distribute the
71
+ Work and such Derivative Works in Source or Object form.
72
+
73
+ 3. Grant of Patent License. Subject to the terms and conditions of
74
+ this License, each Contributor hereby grants to You a perpetual,
75
+ worldwide, non-exclusive, no-charge, royalty-free, irrevocable
76
+ (except as stated in this section) patent license to make, have made,
77
+ use, offer to sell, sell, import, and otherwise transfer the Work,
78
+ where such license applies only to those patent claims licensable
79
+ by such Contributor that are necessarily infringed by their
80
+ Contribution(s) alone or by combination of their Contribution(s)
81
+ with the Work to which such Contribution(s) was submitted. If You
82
+ institute patent litigation against any entity (including a
83
+ cross-claim or counterclaim in a lawsuit) alleging that the Work
84
+ or a Contribution incorporated within the Work constitutes direct
85
+ or contributory patent infringement, then any patent licenses
86
+ granted to You under this License for that Work shall terminate
87
+ as of the date such litigation is filed.
88
+
89
+ 4. Redistribution. You may reproduce and distribute copies of the
90
+ Work or Derivative Works thereof in any medium, with or without
91
+ modifications, and in Source or Object form, provided that You
92
+ meet the following conditions:
93
+
94
+ (a) You must give any other recipients of the Work or
95
+ Derivative Works a copy of this License; and
96
+
97
+ (b) You must cause any modified files to carry prominent notices
98
+ stating that You changed the files; and
99
+
100
+ (c) You must retain, in the Source form of any Derivative Works
101
+ that You distribute, all copyright, patent, trademark, and
102
+ attribution notices from the Source form of the Work,
103
+ excluding those notices that do not pertain to any part of
104
+ the Derivative Works; and
105
+
106
+ (d) If the Work includes a "NOTICE" text file as part of its
107
+ distribution, then any Derivative Works that You distribute must
108
+ include a readable copy of the attribution notices contained
109
+ within such NOTICE file, excluding those notices that do not
110
+ pertain to any part of the Derivative Works, in at least one
111
+ of the following places: within a NOTICE text file distributed
112
+ as part of the Derivative Works; within the Source form or
113
+ documentation, if provided along with the Derivative Works; or,
114
+ within a display generated by the Derivative Works, if and
115
+ wherever such third-party notices normally appear. The contents
116
+ of the NOTICE file are for informational purposes only and
117
+ do not modify the License. You may add Your own attribution
118
+ notices within Derivative Works that You distribute, alongside
119
+ or as an addendum to the NOTICE text from the Work, provided
120
+ that such additional attribution notices cannot be construed
121
+ as modifying the License.
122
+
123
+ You may add Your own copyright statement to Your modifications and
124
+ may provide additional or different license terms and conditions
125
+ for use, reproduction, or distribution of Your modifications, or
126
+ for any such Derivative Works as a whole, provided Your use,
127
+ reproduction, and distribution of the Work otherwise complies with
128
+ the conditions stated in this License.
129
+
130
+ 5. Submission of Contributions. Unless You explicitly state otherwise,
131
+ any Contribution intentionally submitted for inclusion in the Work
132
+ by You to the Licensor shall be under the terms and conditions of
133
+ this License, without any additional terms or conditions.
134
+ Notwithstanding the above, nothing herein shall supersede or modify
135
+ the terms of any separate license agreement you may have executed
136
+ with Licensor regarding such Contributions.
137
+
138
+ 6. Trademarks. This License does not grant permission to use the trade
139
+ names, trademarks, service marks, or product names of the Licensor,
140
+ except as required for reasonable and customary use in describing the
141
+ origin of the Work and reproducing the content of the NOTICE file.
142
+
143
+ 7. Disclaimer of Warranty. Unless required by applicable law or
144
+ agreed to in writing, Licensor provides the Work (and each
145
+ Contributor provides its Contributions) on an "AS IS" BASIS,
146
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
147
+ implied, including, without limitation, any warranties or conditions
148
+ of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
149
+ PARTICULAR PURPOSE. You are solely responsible for determining the
150
+ appropriateness of using or redistributing the Work and assume any
151
+ risks associated with Your exercise of permissions under this License.
152
+
153
+ 8. Limitation of Liability. In no event and under no legal theory,
154
+ whether in tort (including negligence), contract, or otherwise,
155
+ unless required by applicable law (such as deliberate and grossly
156
+ negligent acts) or agreed to in writing, shall any Contributor be
157
+ liable to You for damages, including any direct, indirect, special,
158
+ incidental, or consequential damages of any character arising as a
159
+ result of this License or out of the use or inability to use the
160
+ Work (including but not limited to damages for loss of goodwill,
161
+ work stoppage, computer failure or malfunction, or any and all
162
+ other commercial damages or losses), even if such Contributor
163
+ has been advised of the possibility of such damages.
164
+
165
+ 9. Accepting Warranty or Additional Liability. While redistributing
166
+ the Work or Derivative Works thereof, You may choose to offer,
167
+ and charge a fee for, acceptance of support, warranty, indemnity,
168
+ or other liability obligations and/or rights consistent with this
169
+ License. However, in accepting such obligations, You may act only
170
+ on Your own behalf and on Your sole responsibility, not on behalf
171
+ of any other Contributor, and only if You agree to indemnify,
172
+ defend, and hold each Contributor harmless for any liability
173
+ incurred by, or claims asserted against, such Contributor by reason
174
+ of your accepting any such warranty or additional liability.
175
+
176
+ END OF TERMS AND CONDITIONS
177
+
178
+ APPENDIX: How to apply the Apache License to your work.
179
+
180
+ To apply the Apache License to your work, attach the following
181
+ boilerplate notice, with the fields enclosed by brackets "[]"
182
+ replaced with your own identifying information. (Don't include
183
+ the brackets!) The text should be enclosed in the appropriate
184
+ comment syntax for the file format. We also recommend that a
185
+ file or class name and description of purpose be included on the
186
+ same "printed page" as the copyright notice for easier
187
+ identification within third-party archives.
188
+
189
+ Copyright [yyyy] [name of copyright owner]
190
+
191
+ Licensed under the Apache License, Version 2.0 (the "License");
192
+ you may not use this file except in compliance with the License.
193
+ You may obtain a copy of the License at
194
+
195
+ http://www.apache.org/licenses/LICENSE-2.0
196
+
197
+ Unless required by applicable law or agreed to in writing, software
198
+ distributed under the License is distributed on an "AS IS" BASIS,
199
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
200
+ See the License for the specific language governing permissions and
201
+ limitations under the License.
app.py ADDED
@@ -0,0 +1,155 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import streamlit as st
2
+
3
+ import io
4
+ import os
5
+ import yaml
6
+ import pyarrow
7
+ import tokenizers
8
+
9
+
10
+ os.environ["TOKENIZERS_PARALLELISM"] = "true"
11
+
12
+ # SETTING PAGE CONFIG TO WIDE MODE
13
+ st.set_page_config(layout="wide")
14
+
15
+ @st.cache
16
+ def from_library():
17
+ from retro_reader import RetroReader
18
+ from retro_reader import constants as C
19
+ return C, RetroReader
20
+
21
+ C, RetroReader = from_library()
22
+
23
+ # https://stackoverflow.com/questions/70274841/streamlit-unhashable-typeerror-when-i-use-st-cache
24
+ my_hash_func = {
25
+ io.TextIOWrapper: lambda _: None,
26
+ pyarrow.lib.Buffer: lambda _: 0,
27
+ tokenizers.Tokenizer: lambda _: None,
28
+ tokenizers.AddedToken: lambda _: None
29
+ }
30
+
31
+ # @st.cache(hash_funcs=my_hash_func, allow_output_mutation=True)
32
+ # def load_ko_roberta_large_model():
33
+ # config_file = "configs/inference_ko_roberta_large.yaml"
34
+ # return RetroReader.load(config_file=config_file)
35
+
36
+
37
+ # @st.cache(hash_funcs=my_hash_func, allow_output_mutation=True)
38
+ # def load_ko_electra_small_model():
39
+ # config_file = "configs/inference_ko_electra_small.yaml"
40
+ # return RetroReader.load(config_file=config_file)
41
+
42
+
43
+ # @st.cache(hash_funcs=my_hash_func, allow_output_mutation=True)
44
+ # def load_en_electra_large_model():
45
+ # config_file = "configs/inference_en_electra_large.yaml"
46
+ # return RetroReader.load(config_file=config_file)
47
+
48
+ @st.cache(hash_funcs=my_hash_func, allow_output_mutation=True)
49
+ def load_vi_electra_base_model():
50
+ config_file = "configs/inference_vi_electra_base.yaml"
51
+ return RetroReader.load(config_file=config_file)
52
+
53
+ RETRO_READER_HOST = {
54
+ # "klue/roberta-large": load_ko_roberta_large_model(),
55
+ # "monologg/koelectra-small-v3-discriminator": load_ko_electra_small_model(),
56
+ "google/electra-large-discriminator": load_vi_electra_base_model(),
57
+ }
58
+
59
+
60
+ def main():
61
+ st.title("Retrospective Reader Demo")
62
+
63
+ # st.markdown("## Model name")
64
+ # option = st.selectbox(
65
+ # label="Choose the model used in retro reader",
66
+ # options=(
67
+ # # "[ko_KR] klue/roberta-large",
68
+ # # "[ko_KR] monologg/koelectra-small-v3-discriminator",
69
+ # "[vi_XX] google/electra-large-discriminator",
70
+ # ),
71
+ # index=0,
72
+ # )
73
+ # lang_code, model_name = option.split(" ")
74
+
75
+ retro_reader = load_vi_electra_base_model()
76
+
77
+ # retro_reader = load_model()
78
+ lang_prefix = "EN"
79
+ height = 300
80
+
81
+ # retro_reader.null_score_diff_threshold = st.sidebar.slider(
82
+ # label="null_score_diff_threshold",
83
+ # min_value=-10.0, max_value=10.0, value=0.0, step=1.0,
84
+ # help="ma!",
85
+ # )
86
+ # retro_reader.rear_threshold = st.sidebar.slider(
87
+ # label="rear_threshold",
88
+ # min_value=-10.0, max_value=10.0, value=0.0, step=1.0,
89
+ # help="ma!",
90
+ # )
91
+ # retro_reader.n_best_size = st.sidebar.slider(
92
+ # label="n_best_size",
93
+ # min_value=1, max_value=50, value=20, step=1,
94
+ # help="ma!",
95
+ # )
96
+ # retro_reader.beta1 = st.sidebar.slider(
97
+ # label="beta1",
98
+ # min_value=-10.0, max_value=10.0, value=1.0, step=1.0,
99
+ # help="ma!",
100
+ # )
101
+ # retro_reader.beta2 = st.sidebar.slider(
102
+ # label="beta2",
103
+ # min_value=-10.0, max_value=10.0, value=1.0, step=1.0,
104
+ # help="ma!",
105
+ # )
106
+ # retro_reader.best_cof = st.sidebar.slider(
107
+ # label="best_cof",
108
+ # min_value=-10.0, max_value=10.0, value=1.0, step=1.0,
109
+ # help="ma!",
110
+ # )
111
+ # return_submodule_outputs = st.sidebar.checkbox('return_submodule_outputs', value=False)
112
+ return_submodule_outputs = False
113
+ st.markdown("## Demonstration")
114
+ with st.form(key="my_form"):
115
+ query = st.text_input(
116
+ label="Type your query",
117
+ value=getattr(C, f"{lang_prefix}_EXAMPLE_QUERY"),
118
+ max_chars=None,
119
+ help=getattr(C, f"{lang_prefix}_QUERY_HELP_TEXT"),
120
+ )
121
+ context = st.text_area(
122
+ label="Type your context",
123
+ value=getattr(C, f"{lang_prefix}_EXAMPLE_CONTEXTS"),
124
+ height=height,
125
+ max_chars=None,
126
+ help=getattr(C, f"{lang_prefix}_CONTEXT_HELP_TEXT"),
127
+ )
128
+ submit_button = st.form_submit_button(label="Submit")
129
+
130
+ if submit_button:
131
+ with st.spinner("Please wait.."):
132
+ outputs = retro_reader(
133
+ query=query,
134
+ context=context,
135
+ return_submodule_outputs=return_submodule_outputs,
136
+ )
137
+ answer, score = outputs[0]["id-01"], outputs[1]
138
+ if not answer:
139
+ answer = "No answer"
140
+ st.markdown("## Results")
141
+ st.write(answer)
142
+ st.markdown("### Rear Verification Score")
143
+ st.json(score)
144
+ # if return_submodule_outputs:
145
+ # score_ext, nbest_preds, score_diff = outputs[2:]
146
+ # st.markdown("### Sketch Reader Score (score_ext)")
147
+ # st.json(score_ext)
148
+ # st.markdown("### Intensive Reader Score (score_diff)")
149
+ # st.json(score_diff)
150
+ # st.markdown("### N Best Predictions (from intensive reader)")
151
+ # st.json(nbest_preds)
152
+
153
+
154
+ if __name__ == "__main__":
155
+ main()
configs/inference_vi_electra_base.yaml ADDED
@@ -0,0 +1,41 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ RetroDataModelArguments:
2
+
3
+ # DataArguments
4
+ max_seq_length: 512
5
+ max_answer_length: 30
6
+ doc_stride: 128
7
+ return_token_type_ids: True
8
+ pad_to_max_length: True
9
+ preprocessing_num_workers: 5
10
+ overwrite_cache: False
11
+ version_2_with_negative: True
12
+ null_score_diff_threshold: 0.0
13
+ rear_threshold: 0.0
14
+ n_best_size: 20
15
+ use_choice_logits: False
16
+ start_n_top: -1
17
+ end_n_top: -1
18
+ beta1: 1
19
+ beta2: 1
20
+ best_cof: 1
21
+
22
+ # ModelArguments
23
+ use_auth_token: False
24
+
25
+ # SketchModelArguments
26
+ sketch_revision: en-electra-large-sketch
27
+ sketch_model_name: ./outputs/sketch/checkpoint-23900/
28
+ sketch_architectures: ElectraForSequenceClassification
29
+
30
+ # IntensiveModelArguments
31
+ intensive_revision: en-electra-largs-intensive
32
+ intensive_model_name: ./outputs/intensive/checkpoint-23900/
33
+ intensive_architectures: ElectraForQuestionAnsweringAVPool
34
+
35
+
36
+ TrainingArguments:
37
+ output_dir: outputs
38
+ no_cuda: True # If you want to use cuda,
39
+ # change `no_cuda` to False and `fp16` to True
40
+ per_device_train_batch_size: 1
41
+ per_device_eval_batch_size: 1
configs/train_vi_electra_base.yaml ADDED
@@ -0,0 +1,49 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ RetroDataModelArguments:
2
+
3
+ # DataArguments
4
+ max_seq_length: 512
5
+ max_answer_length: 30
6
+ doc_stride: 128
7
+ return_token_type_ids: True
8
+ pad_to_max_length: True
9
+ preprocessing_num_workers: 5
10
+ overwrite_cache: False
11
+ version_2_with_negative: True
12
+ null_score_diff_threshold: 0.0
13
+ rear_threshold: 0.0
14
+ n_best_size: 20
15
+ use_choice_logits: False
16
+ start_n_top: -1
17
+ end_n_top: -1
18
+ beta1: 1
19
+ beta2: 1
20
+ best_cof: 1
21
+
22
+ # SketchModelArguments
23
+ sketch_model_name: NlpHUST/electra-base-vn
24
+ sketch_architectures: ElectraForSequenceClassification
25
+
26
+ # IntensiveModelArguments
27
+ intensive_model_name: NlpHUST/electra-base-vn
28
+ intensive_architectures: ElectraForQuestionAnsweringAVPool
29
+
30
+
31
+ TrainingArguments:
32
+ report_to: wandb
33
+ run_name: squadv2-electra-large-sketch,squadv2-electra-large-intensive
34
+ output_dir: outputs
35
+ overwrite_output_dir: False
36
+ learning_rate: 2e-5
37
+ evaluation_strategy: epoch
38
+ save_strategy: epoch
39
+ per_device_train_batch_size: 12
40
+ per_device_eval_batch_size: 12
41
+ num_train_epochs: 10.0
42
+ # save_steps: 5000
43
+ save_total_limit: 2
44
+ fp16: True
45
+ warmup_ratio: 0.1
46
+ weight_decay: 0.01
47
+ load_best_model_at_end: True
48
+ metric_for_best_model: f1,exact
49
+ logging_dir: logs
outputs/intensive/checkpoint-23900/config.json ADDED
@@ -0,0 +1,30 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "NlpHUST/electra-base-vn",
3
+ "architectures": [
4
+ "ElectraForQuestionAnsweringAVPool"
5
+ ],
6
+ "attention_probs_dropout_prob": 0.1,
7
+ "classifier_dropout": null,
8
+ "embedding_size": 768,
9
+ "hidden_act": "gelu",
10
+ "hidden_dropout_prob": 0.1,
11
+ "hidden_size": 768,
12
+ "initializer_range": 0.02,
13
+ "intermediate_size": 3072,
14
+ "layer_norm_eps": 1e-12,
15
+ "max_position_embeddings": 512,
16
+ "model_type": "electra",
17
+ "num_attention_heads": 12,
18
+ "num_hidden_layers": 12,
19
+ "pad_token_id": 0,
20
+ "position_embedding_type": "absolute",
21
+ "summary_activation": "gelu",
22
+ "summary_last_dropout": 0.1,
23
+ "summary_type": "first",
24
+ "summary_use_proj": true,
25
+ "torch_dtype": "float32",
26
+ "transformers_version": "4.21.1",
27
+ "type_vocab_size": 2,
28
+ "use_cache": true,
29
+ "vocab_size": 62000
30
+ }
outputs/intensive/checkpoint-23900/pytorch_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e31e5f953e409735eb91c55c9c865412dd75d45a99a6a4a0b156b7ae7ac2be2e
3
+ size 532350957
outputs/intensive/checkpoint-23900/rng_state.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3f746b0186296f2263ff69d55705b37e5ffd3f32456e0e399cf3c6b97a0f7d4f
3
+ size 14503
outputs/intensive/checkpoint-23900/scaler.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5538239ac364de835557402682c0e056dcc1e97f5ee87f4a2958b61773eb625d
3
+ size 559
outputs/intensive/checkpoint-23900/scheduler.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7d3d346564087658131fb56d711b982ef1a8a44b9b23d2ae1417d098510e4849
3
+ size 623
outputs/intensive/checkpoint-23900/special_tokens_map.json ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ {
2
+ "cls_token": "[CLS]",
3
+ "mask_token": "[MASK]",
4
+ "pad_token": "[PAD]",
5
+ "sep_token": "[SEP]",
6
+ "unk_token": "[UNK]"
7
+ }
outputs/intensive/checkpoint-23900/tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
outputs/intensive/checkpoint-23900/tokenizer_config.json ADDED
@@ -0,0 +1,15 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cls_token": "[CLS]",
3
+ "do_basic_tokenize": true,
4
+ "do_lower_case": false,
5
+ "mask_token": "[MASK]",
6
+ "name_or_path": "NlpHUST/electra-base-vn",
7
+ "never_split": null,
8
+ "pad_token": "[PAD]",
9
+ "sep_token": "[SEP]",
10
+ "special_tokens_map_file": null,
11
+ "strip_accents": null,
12
+ "tokenize_chinese_chars": true,
13
+ "tokenizer_class": "ElectraTokenizer",
14
+ "unk_token": "[UNK]"
15
+ }
outputs/intensive/checkpoint-23900/trainer_state.json ADDED
@@ -0,0 +1,468 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_metric": 60.008646779074795,
3
+ "best_model_checkpoint": "outputs/intensive/checkpoint-23900",
4
+ "epoch": 10.0,
5
+ "global_step": 23900,
6
+ "is_hyper_param_search": false,
7
+ "is_local_process_zero": true,
8
+ "is_world_process_zero": true,
9
+ "log_history": [
10
+ {
11
+ "epoch": 0.21,
12
+ "learning_rate": 4.150627615062761e-06,
13
+ "loss": 10.7765,
14
+ "step": 500
15
+ },
16
+ {
17
+ "epoch": 0.42,
18
+ "learning_rate": 8.334728033472804e-06,
19
+ "loss": 6.9262,
20
+ "step": 1000
21
+ },
22
+ {
23
+ "epoch": 0.63,
24
+ "learning_rate": 1.2518828451882846e-05,
25
+ "loss": 5.081,
26
+ "step": 1500
27
+ },
28
+ {
29
+ "epoch": 0.84,
30
+ "learning_rate": 1.670292887029289e-05,
31
+ "loss": 4.5427,
32
+ "step": 2000
33
+ },
34
+ {
35
+ "epoch": 1.0,
36
+ "eval_HasAns_exact": 34.392563769995675,
37
+ "eval_HasAns_f1": 49.07185024538246,
38
+ "eval_HasAns_total": 9252,
39
+ "eval_best_exact": 37.602680501513184,
40
+ "eval_best_exact_thresh": 6.36328125,
41
+ "eval_best_f1": 52.28196697689995,
42
+ "eval_best_f1_thresh": 6.36328125,
43
+ "eval_exact": 34.392563769995675,
44
+ "eval_f1": 49.07185024538246,
45
+ "eval_runtime": 157.3533,
46
+ "eval_samples_per_second": 59.293,
47
+ "eval_steps_per_second": 4.944,
48
+ "eval_total": 9252,
49
+ "step": 2390
50
+ },
51
+ {
52
+ "epoch": 1.05,
53
+ "learning_rate": 1.99023709902371e-05,
54
+ "loss": 4.2046,
55
+ "step": 2500
56
+ },
57
+ {
58
+ "epoch": 1.26,
59
+ "learning_rate": 1.9437470943747097e-05,
60
+ "loss": 3.8663,
61
+ "step": 3000
62
+ },
63
+ {
64
+ "epoch": 1.46,
65
+ "learning_rate": 1.897257089725709e-05,
66
+ "loss": 3.7485,
67
+ "step": 3500
68
+ },
69
+ {
70
+ "epoch": 1.67,
71
+ "learning_rate": 1.850767085076709e-05,
72
+ "loss": 3.6324,
73
+ "step": 4000
74
+ },
75
+ {
76
+ "epoch": 1.88,
77
+ "learning_rate": 1.804370060437006e-05,
78
+ "loss": 3.5147,
79
+ "step": 4500
80
+ },
81
+ {
82
+ "epoch": 2.0,
83
+ "eval_HasAns_exact": 44.32555123216602,
84
+ "eval_HasAns_f1": 58.396422080934855,
85
+ "eval_HasAns_total": 9252,
86
+ "eval_best_exact": 50.72416774751405,
87
+ "eval_best_exact_thresh": 6.75,
88
+ "eval_best_f1": 64.79503859628278,
89
+ "eval_best_f1_thresh": 6.75,
90
+ "eval_exact": 44.32555123216602,
91
+ "eval_f1": 58.396422080934855,
92
+ "eval_runtime": 158.1216,
93
+ "eval_samples_per_second": 59.005,
94
+ "eval_steps_per_second": 4.92,
95
+ "eval_total": 9252,
96
+ "step": 4780
97
+ },
98
+ {
99
+ "epoch": 2.09,
100
+ "learning_rate": 1.7579730357973036e-05,
101
+ "loss": 3.2078,
102
+ "step": 5000
103
+ },
104
+ {
105
+ "epoch": 2.3,
106
+ "learning_rate": 1.711576011157601e-05,
107
+ "loss": 2.7938,
108
+ "step": 5500
109
+ },
110
+ {
111
+ "epoch": 2.51,
112
+ "learning_rate": 1.665086006508601e-05,
113
+ "loss": 2.781,
114
+ "step": 6000
115
+ },
116
+ {
117
+ "epoch": 2.72,
118
+ "learning_rate": 1.6185960018596003e-05,
119
+ "loss": 2.7884,
120
+ "step": 6500
121
+ },
122
+ {
123
+ "epoch": 2.93,
124
+ "learning_rate": 1.5721059972105997e-05,
125
+ "loss": 2.6837,
126
+ "step": 7000
127
+ },
128
+ {
129
+ "epoch": 3.0,
130
+ "eval_HasAns_exact": 49.48119325551232,
131
+ "eval_HasAns_f1": 61.58352384463478,
132
+ "eval_HasAns_total": 9252,
133
+ "eval_best_exact": 66.6126242974492,
134
+ "eval_best_exact_thresh": 13.5234375,
135
+ "eval_best_f1": 78.71495488657156,
136
+ "eval_best_f1_thresh": 13.5234375,
137
+ "eval_exact": 49.48119325551232,
138
+ "eval_f1": 61.58352384463478,
139
+ "eval_runtime": 161.8926,
140
+ "eval_samples_per_second": 57.631,
141
+ "eval_steps_per_second": 4.806,
142
+ "eval_total": 9252,
143
+ "step": 7170
144
+ },
145
+ {
146
+ "epoch": 3.14,
147
+ "learning_rate": 1.5256159925615994e-05,
148
+ "loss": 2.2675,
149
+ "step": 7500
150
+ },
151
+ {
152
+ "epoch": 3.35,
153
+ "learning_rate": 1.479125987912599e-05,
154
+ "loss": 2.0381,
155
+ "step": 8000
156
+ },
157
+ {
158
+ "epoch": 3.56,
159
+ "learning_rate": 1.4326359832635986e-05,
160
+ "loss": 2.0343,
161
+ "step": 8500
162
+ },
163
+ {
164
+ "epoch": 3.77,
165
+ "learning_rate": 1.3861459786145978e-05,
166
+ "loss": 2.0476,
167
+ "step": 9000
168
+ },
169
+ {
170
+ "epoch": 3.97,
171
+ "learning_rate": 1.3396559739655974e-05,
172
+ "loss": 2.0245,
173
+ "step": 9500
174
+ },
175
+ {
176
+ "epoch": 4.0,
177
+ "eval_HasAns_exact": 51.14569822741029,
178
+ "eval_HasAns_f1": 61.58064803440931,
179
+ "eval_HasAns_total": 9252,
180
+ "eval_best_exact": 77.45352356247298,
181
+ "eval_best_exact_thresh": 17.703125,
182
+ "eval_best_f1": 87.88847336947204,
183
+ "eval_best_f1_thresh": 17.703125,
184
+ "eval_exact": 51.14569822741029,
185
+ "eval_f1": 61.58064803440931,
186
+ "eval_runtime": 159.5263,
187
+ "eval_samples_per_second": 58.486,
188
+ "eval_steps_per_second": 4.877,
189
+ "eval_total": 9252,
190
+ "step": 9560
191
+ },
192
+ {
193
+ "epoch": 4.18,
194
+ "learning_rate": 1.293165969316597e-05,
195
+ "loss": 1.4757,
196
+ "step": 10000
197
+ },
198
+ {
199
+ "epoch": 4.39,
200
+ "learning_rate": 1.2466759646675965e-05,
201
+ "loss": 1.4193,
202
+ "step": 10500
203
+ },
204
+ {
205
+ "epoch": 4.6,
206
+ "learning_rate": 1.200278940027894e-05,
207
+ "loss": 1.4397,
208
+ "step": 11000
209
+ },
210
+ {
211
+ "epoch": 4.81,
212
+ "learning_rate": 1.1537889353788937e-05,
213
+ "loss": 1.4482,
214
+ "step": 11500
215
+ },
216
+ {
217
+ "epoch": 5.0,
218
+ "eval_HasAns_exact": 54.31258106355383,
219
+ "eval_HasAns_f1": 62.505983151723,
220
+ "eval_HasAns_total": 9252,
221
+ "eval_best_exact": 84.15477734543883,
222
+ "eval_best_exact_thresh": 19.4375,
223
+ "eval_best_f1": 92.34817943360805,
224
+ "eval_best_f1_thresh": 19.4375,
225
+ "eval_exact": 54.31258106355383,
226
+ "eval_f1": 62.505983151723,
227
+ "eval_runtime": 110.395,
228
+ "eval_samples_per_second": 84.515,
229
+ "eval_steps_per_second": 7.047,
230
+ "eval_total": 9252,
231
+ "step": 11950
232
+ },
233
+ {
234
+ "epoch": 5.02,
235
+ "learning_rate": 1.107298930729893e-05,
236
+ "loss": 1.4318,
237
+ "step": 12000
238
+ },
239
+ {
240
+ "epoch": 5.23,
241
+ "learning_rate": 1.0608089260808926e-05,
242
+ "loss": 1.0225,
243
+ "step": 12500
244
+ },
245
+ {
246
+ "epoch": 5.44,
247
+ "learning_rate": 1.0143189214318922e-05,
248
+ "loss": 1.0562,
249
+ "step": 13000
250
+ },
251
+ {
252
+ "epoch": 5.65,
253
+ "learning_rate": 9.679218967921897e-06,
254
+ "loss": 1.0755,
255
+ "step": 13500
256
+ },
257
+ {
258
+ "epoch": 5.86,
259
+ "learning_rate": 9.214318921431893e-06,
260
+ "loss": 1.0455,
261
+ "step": 14000
262
+ },
263
+ {
264
+ "epoch": 6.0,
265
+ "eval_HasAns_exact": 56.91742325983571,
266
+ "eval_HasAns_f1": 63.24948077893548,
267
+ "eval_HasAns_total": 9252,
268
+ "eval_best_exact": 87.2568093385214,
269
+ "eval_best_exact_thresh": 25.234375,
270
+ "eval_best_f1": 93.58886685762127,
271
+ "eval_best_f1_thresh": 25.234375,
272
+ "eval_exact": 56.91742325983571,
273
+ "eval_f1": 63.24948077893548,
274
+ "eval_runtime": 109.5824,
275
+ "eval_samples_per_second": 85.141,
276
+ "eval_steps_per_second": 7.1,
277
+ "eval_total": 9252,
278
+ "step": 14340
279
+ },
280
+ {
281
+ "epoch": 6.07,
282
+ "learning_rate": 8.749418874941889e-06,
283
+ "loss": 0.9775,
284
+ "step": 14500
285
+ },
286
+ {
287
+ "epoch": 6.28,
288
+ "learning_rate": 8.284518828451885e-06,
289
+ "loss": 0.7959,
290
+ "step": 15000
291
+ },
292
+ {
293
+ "epoch": 6.49,
294
+ "learning_rate": 7.820548582054858e-06,
295
+ "loss": 0.7763,
296
+ "step": 15500
297
+ },
298
+ {
299
+ "epoch": 6.69,
300
+ "learning_rate": 7.3565783356578345e-06,
301
+ "loss": 0.841,
302
+ "step": 16000
303
+ },
304
+ {
305
+ "epoch": 6.9,
306
+ "learning_rate": 6.891678289167829e-06,
307
+ "loss": 0.8104,
308
+ "step": 16500
309
+ },
310
+ {
311
+ "epoch": 7.0,
312
+ "eval_HasAns_exact": 58.36575875486381,
313
+ "eval_HasAns_f1": 63.35313006280799,
314
+ "eval_HasAns_total": 9252,
315
+ "eval_best_exact": 89.95892779939473,
316
+ "eval_best_exact_thresh": 28.4375,
317
+ "eval_best_f1": 94.9462991073391,
318
+ "eval_best_f1_thresh": 28.4375,
319
+ "eval_exact": 58.36575875486381,
320
+ "eval_f1": 63.35313006280799,
321
+ "eval_runtime": 111.4095,
322
+ "eval_samples_per_second": 83.745,
323
+ "eval_steps_per_second": 6.983,
324
+ "eval_total": 9252,
325
+ "step": 16730
326
+ },
327
+ {
328
+ "epoch": 7.11,
329
+ "learning_rate": 6.426778242677825e-06,
330
+ "loss": 0.6864,
331
+ "step": 17000
332
+ },
333
+ {
334
+ "epoch": 7.32,
335
+ "learning_rate": 5.96187819618782e-06,
336
+ "loss": 0.6432,
337
+ "step": 17500
338
+ },
339
+ {
340
+ "epoch": 7.53,
341
+ "learning_rate": 5.496978149697816e-06,
342
+ "loss": 0.6308,
343
+ "step": 18000
344
+ },
345
+ {
346
+ "epoch": 7.74,
347
+ "learning_rate": 5.032078103207811e-06,
348
+ "loss": 0.6595,
349
+ "step": 18500
350
+ },
351
+ {
352
+ "epoch": 7.95,
353
+ "learning_rate": 4.567178056717806e-06,
354
+ "loss": 0.6227,
355
+ "step": 19000
356
+ },
357
+ {
358
+ "epoch": 8.0,
359
+ "eval_HasAns_exact": 59.262862083873756,
360
+ "eval_HasAns_f1": 63.181652613846246,
361
+ "eval_HasAns_total": 9252,
362
+ "eval_best_exact": 91.74232598357112,
363
+ "eval_best_exact_thresh": 29.84375,
364
+ "eval_best_f1": 95.6611165135438,
365
+ "eval_best_f1_thresh": 29.84375,
366
+ "eval_exact": 59.262862083873756,
367
+ "eval_f1": 63.181652613846246,
368
+ "eval_runtime": 111.1644,
369
+ "eval_samples_per_second": 83.93,
370
+ "eval_steps_per_second": 6.999,
371
+ "eval_total": 9252,
372
+ "step": 19120
373
+ },
374
+ {
375
+ "epoch": 8.16,
376
+ "learning_rate": 4.102278010227801e-06,
377
+ "loss": 0.5235,
378
+ "step": 19500
379
+ },
380
+ {
381
+ "epoch": 8.37,
382
+ "learning_rate": 3.638307763830777e-06,
383
+ "loss": 0.4937,
384
+ "step": 20000
385
+ },
386
+ {
387
+ "epoch": 8.58,
388
+ "learning_rate": 3.173407717340772e-06,
389
+ "loss": 0.5096,
390
+ "step": 20500
391
+ },
392
+ {
393
+ "epoch": 8.79,
394
+ "learning_rate": 2.7085076708507676e-06,
395
+ "loss": 0.5335,
396
+ "step": 21000
397
+ },
398
+ {
399
+ "epoch": 9.0,
400
+ "learning_rate": 2.2436076243607625e-06,
401
+ "loss": 0.5032,
402
+ "step": 21500
403
+ },
404
+ {
405
+ "epoch": 9.0,
406
+ "eval_HasAns_exact": 59.814094249891916,
407
+ "eval_HasAns_f1": 63.04596094446208,
408
+ "eval_HasAns_total": 9252,
409
+ "eval_best_exact": 92.19628188499784,
410
+ "eval_best_exact_thresh": 30.34375,
411
+ "eval_best_f1": 95.42814857956823,
412
+ "eval_best_f1_thresh": 30.34375,
413
+ "eval_exact": 59.814094249891916,
414
+ "eval_f1": 63.04596094446208,
415
+ "eval_runtime": 110.9375,
416
+ "eval_samples_per_second": 84.101,
417
+ "eval_steps_per_second": 7.013,
418
+ "eval_total": 9252,
419
+ "step": 21510
420
+ },
421
+ {
422
+ "epoch": 9.21,
423
+ "learning_rate": 1.778707577870758e-06,
424
+ "loss": 0.4126,
425
+ "step": 22000
426
+ },
427
+ {
428
+ "epoch": 9.41,
429
+ "learning_rate": 1.3138075313807533e-06,
430
+ "loss": 0.4188,
431
+ "step": 22500
432
+ },
433
+ {
434
+ "epoch": 9.62,
435
+ "learning_rate": 8.498372849837285e-07,
436
+ "loss": 0.4326,
437
+ "step": 23000
438
+ },
439
+ {
440
+ "epoch": 9.83,
441
+ "learning_rate": 3.858670385867039e-07,
442
+ "loss": 0.4378,
443
+ "step": 23500
444
+ },
445
+ {
446
+ "epoch": 10.0,
447
+ "eval_HasAns_exact": 60.008646779074795,
448
+ "eval_HasAns_f1": 62.92824770736451,
449
+ "eval_HasAns_total": 9252,
450
+ "eval_best_exact": 92.7583225248595,
451
+ "eval_best_exact_thresh": 31.09375,
452
+ "eval_best_f1": 95.67792345314943,
453
+ "eval_best_f1_thresh": 31.09375,
454
+ "eval_exact": 60.008646779074795,
455
+ "eval_f1": 62.92824770736451,
456
+ "eval_runtime": 107.3386,
457
+ "eval_samples_per_second": 86.921,
458
+ "eval_steps_per_second": 7.248,
459
+ "eval_total": 9252,
460
+ "step": 23900
461
+ }
462
+ ],
463
+ "max_steps": 23900,
464
+ "num_train_epochs": 10,
465
+ "total_flos": 7.493865187135488e+16,
466
+ "trial_name": null,
467
+ "trial_params": null
468
+ }
outputs/intensive/checkpoint-23900/training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2a2ea25156a9af14389b6fb124369129c6a6bd78f82c00c33c31d87753dbaa92
3
+ size 3311
outputs/intensive/checkpoint-23900/vocab.txt ADDED
The diff for this file is too large to render. See raw diff
 
outputs/intensive/nbest_predictions.json ADDED
@@ -0,0 +1,124 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "id-01": [
3
+ {
4
+ "start_logit": 6.045530319213867,
5
+ "end_logit": 5.49807071685791,
6
+ "text": "Nguy\u1ec5n \u0110\u1ee9c C\u01b0\u1eddng",
7
+ "probability": 0.9822916984558105
8
+ },
9
+ {
10
+ "start_logit": 6.045530319213867,
11
+ "end_logit": 1.412649154663086,
12
+ "text": "Nguy\u1ec5n \u0110\u1ee9c C\u01b0\u1eddng (sinh ng\u00e0y 13 th\u00e1ng 5 n\u0103m 1989 t\u1ea1i Qu\u1ea3ng Ninh, nh\u01b0ng qu\u00ea g\u1ed1c \u1edf \u00c2n Thi, H\u01b0ng Y\u00ean)",
13
+ "probability": 0.01651826500892639
14
+ },
15
+ {
16
+ "start_logit": 6.045530319213867,
17
+ "end_logit": -1.957545280456543,
18
+ "text": "Nguy\u1ec5n",
19
+ "probability": 0.0005679467576555908
20
+ },
21
+ {
22
+ "start_logit": -2.662353992462158,
23
+ "end_logit": 5.49807071685791,
24
+ "text": "C\u01b0\u1eddng",
25
+ "probability": 0.0001623508578632027
26
+ },
27
+ {
28
+ "start_logit": -2.665097713470459,
29
+ "end_logit": 5.49807071685791,
30
+ "text": "\u0110\u1ee9c C\u01b0\u1eddng",
31
+ "probability": 0.00016190586029551923
32
+ },
33
+ {
34
+ "start_logit": 6.045530319213867,
35
+ "end_logit": -3.320735454559326,
36
+ "text": "Nguy\u1ec5n \u0110\u1ee9c C\u01b0\u1eddng (sinh ng\u00e0y 13 th\u00e1ng 5 n\u0103m 1989",
37
+ "probability": 0.0001453054283047095
38
+ },
39
+ {
40
+ "start_logit": 1.1949851512908936,
41
+ "end_logit": 0.35255151987075806,
42
+ "text": "l\u00e0 m\u1ed9t nam rapper v\u00e0 nh\u1ea1c s\u0129 ng\u01b0\u1eddi Vi\u1ec7t Nam",
43
+ "probability": 4.477184484130703e-05
44
+ },
45
+ {
46
+ "start_logit": 6.045530319213867,
47
+ "end_logit": -4.638460636138916,
48
+ "text": "Nguy\u1ec5n \u0110\u1ee9c C\u01b0\u1eddng (",
49
+ "probability": 3.890457082889043e-05
50
+ },
51
+ {
52
+ "start_logit": 6.045530319213867,
53
+ "end_logit": -5.202346324920654,
54
+ "text": "Nguy\u1ec5n \u0110\u1ee9c",
55
+ "probability": 2.2136480765766464e-05
56
+ },
57
+ {
58
+ "start_logit": 0.23849983513355255,
59
+ "end_logit": 0.35255151987075806,
60
+ "text": "m\u1ed9t nam rapper v\u00e0 nh\u1ea1c s\u0129 ng\u01b0\u1eddi Vi\u1ec7t Nam",
61
+ "probability": 1.720317050057929e-05
62
+ },
63
+ {
64
+ "start_logit": 6.045530319213867,
65
+ "end_logit": -6.049441814422607,
66
+ "text": "Nguy\u1ec5n \u0110\u1ee9c C\u01b0\u1eddng (sinh ng\u00e0y 13 th\u00e1ng 5 n\u0103m 1989 t\u1ea1i Qu\u1ea3ng Ninh, nh\u01b0ng qu\u00ea g\u1ed1c \u1edf \u00c2n Thi",
67
+ "probability": 9.488983778283e-06
68
+ },
69
+ {
70
+ "start_logit": 6.045530319213867,
71
+ "end_logit": -6.32877254486084,
72
+ "text": "Nguy\u1ec5n \u0110\u1ee9c C\u01b0\u1eddng (sinh ng\u00e0y 13 th\u00e1ng 5 n\u0103m 1989 t\u1ea1i Qu\u1ea3ng Ninh, nh\u01b0ng qu\u00ea g\u1ed1c \u1edf \u00c2n Thi, H\u01b0ng Y\u00ean",
73
+ "probability": 7.176417511800537e-06
74
+ },
75
+ {
76
+ "start_logit": 6.045530319213867,
77
+ "end_logit": -6.484272003173828,
78
+ "text": "Nguy\u1ec5n \u0110\u1ee9c C\u01b0\u1eddng (sinh ng\u00e0y 13 th\u00e1ng 5 n\u0103m 1989 t\u1ea1i Qu\u1ea3ng Ninh, nh\u01b0ng qu\u00ea g\u1ed1c \u1edf \u00c2n Thi, H\u01b0ng Y\u00ean),",
79
+ "probability": 6.142924121377291e-06
80
+ },
81
+ {
82
+ "start_logit": -2.662353992462158,
83
+ "end_logit": 1.412649154663086,
84
+ "text": "C\u01b0\u1eddng (sinh ng\u00e0y 13 th\u00e1ng 5 n\u0103m 1989 t\u1ea1i Qu\u1ea3ng Ninh, nh\u01b0ng qu\u00ea g\u1ed1c \u1edf \u00c2n Thi, H\u01b0ng Y\u00ean)",
85
+ "probability": 2.7300973215460544e-06
86
+ },
87
+ {
88
+ "start_logit": -2.665097713470459,
89
+ "end_logit": 1.412649154663086,
90
+ "text": "\u0110\u1ee9c C\u01b0\u1eddng (sinh ng\u00e0y 13 th\u00e1ng 5 n\u0103m 1989 t\u1ea1i Qu\u1ea3ng Ninh, nh\u01b0ng qu\u00ea g\u1ed1c \u1edf \u00c2n Thi, H\u01b0ng Y\u00ean)",
91
+ "probability": 2.722619228734402e-06
92
+ },
93
+ {
94
+ "start_logit": -1.3381463289260864,
95
+ "end_logit": -1.3950865268707275,
96
+ "text": "",
97
+ "probability": 6.1928426475788e-07
98
+ },
99
+ {
100
+ "start_logit": -4.9331583976745605,
101
+ "end_logit": 1.412649154663086,
102
+ "text": "sinh ng\u00e0y 13 th\u00e1ng 5 n\u0103m 1989 t\u1ea1i Qu\u1ea3ng Ninh, nh\u01b0ng qu\u00ea g\u1ed1c \u1edf \u00c2n Thi, H\u01b0ng Y\u00ean)",
103
+ "probability": 2.8182577693769417e-07
104
+ },
105
+ {
106
+ "start_logit": -5.086843013763428,
107
+ "end_logit": 1.412649154663086,
108
+ "text": ")",
109
+ "probability": 2.416775828351092e-07
110
+ },
111
+ {
112
+ "start_logit": 1.1949851512908936,
113
+ "end_logit": -5.277959823608398,
114
+ "text": "l\u00e0 m\u1ed9t nam rapper",
115
+ "probability": 1.6058500307281065e-07
116
+ },
117
+ {
118
+ "start_logit": -5.693793296813965,
119
+ "end_logit": 1.412649154663086,
120
+ "text": "(sinh ng\u00e0y 13 th\u00e1ng 5 n\u0103m 1989 t\u1ea1i Qu\u1ea3ng Ninh, nh\u01b0ng qu\u00ea g\u1ed1c \u1edf \u00c2n Thi, H\u01b0ng Y\u00ean)",
121
+ "probability": 1.3171673174383614e-07
122
+ }
123
+ ]
124
+ }
outputs/intensive/null_odds.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ {
2
+ "id-01": -14.276833534240723
3
+ }
outputs/intensive/predictions.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ {
2
+ "id-01": "Nguy\u1ec5n \u0110\u1ee9c C\u01b0\u1eddng"
3
+ }
outputs/sketch/checkpoint-23900/config.json ADDED
@@ -0,0 +1,31 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "NlpHUST/electra-base-vn",
3
+ "architectures": [
4
+ "ElectraForSequenceClassification"
5
+ ],
6
+ "attention_probs_dropout_prob": 0.1,
7
+ "classifier_dropout": null,
8
+ "embedding_size": 768,
9
+ "hidden_act": "gelu",
10
+ "hidden_dropout_prob": 0.1,
11
+ "hidden_size": 768,
12
+ "initializer_range": 0.02,
13
+ "intermediate_size": 3072,
14
+ "layer_norm_eps": 1e-12,
15
+ "max_position_embeddings": 512,
16
+ "model_type": "electra",
17
+ "num_attention_heads": 12,
18
+ "num_hidden_layers": 12,
19
+ "pad_token_id": 0,
20
+ "position_embedding_type": "absolute",
21
+ "problem_type": "single_label_classification",
22
+ "summary_activation": "gelu",
23
+ "summary_last_dropout": 0.1,
24
+ "summary_type": "first",
25
+ "summary_use_proj": true,
26
+ "torch_dtype": "float32",
27
+ "transformers_version": "4.21.1",
28
+ "type_vocab_size": 2,
29
+ "use_cache": true,
30
+ "vocab_size": 62000
31
+ }
outputs/sketch/checkpoint-23900/pytorch_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ee5ef0330b58fc9299f1f07ac1f057596d02f2350f66ed80eb8f1479aef56bec
3
+ size 534707117
outputs/sketch/checkpoint-23900/rng_state.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:37085a94bb8702116f42aba75288847f204d26e3445d8ef1e587654032735258
3
+ size 14503
outputs/sketch/checkpoint-23900/scaler.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0d3fcb2fd6a93bcec231636ca78e55ddc7e0cf35aa3208c3cb78468921200b8d
3
+ size 559
outputs/sketch/checkpoint-23900/scheduler.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4037b40aa9d735b8382af812ddac621c3546c12b2b539da6400f157fd74b61ae
3
+ size 623
outputs/sketch/checkpoint-23900/special_tokens_map.json ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ {
2
+ "cls_token": "[CLS]",
3
+ "mask_token": "[MASK]",
4
+ "pad_token": "[PAD]",
5
+ "sep_token": "[SEP]",
6
+ "unk_token": "[UNK]"
7
+ }
outputs/sketch/checkpoint-23900/tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
outputs/sketch/checkpoint-23900/tokenizer_config.json ADDED
@@ -0,0 +1,15 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cls_token": "[CLS]",
3
+ "do_basic_tokenize": true,
4
+ "do_lower_case": false,
5
+ "mask_token": "[MASK]",
6
+ "name_or_path": "NlpHUST/electra-base-vn",
7
+ "never_split": null,
8
+ "pad_token": "[PAD]",
9
+ "sep_token": "[SEP]",
10
+ "special_tokens_map_file": null,
11
+ "strip_accents": null,
12
+ "tokenize_chinese_chars": true,
13
+ "tokenizer_class": "ElectraTokenizer",
14
+ "unk_token": "[UNK]"
15
+ }
outputs/sketch/checkpoint-23900/trainer_state.json ADDED
@@ -0,0 +1,408 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_metric": 0.9979028875625101,
3
+ "best_model_checkpoint": "outputs/sketch/checkpoint-23900",
4
+ "epoch": 10.0,
5
+ "global_step": 23900,
6
+ "is_hyper_param_search": false,
7
+ "is_local_process_zero": true,
8
+ "is_world_process_zero": true,
9
+ "log_history": [
10
+ {
11
+ "epoch": 0.21,
12
+ "learning_rate": 4.184100418410042e-06,
13
+ "loss": 0.6451,
14
+ "step": 500
15
+ },
16
+ {
17
+ "epoch": 0.42,
18
+ "learning_rate": 8.368200836820084e-06,
19
+ "loss": 0.6334,
20
+ "step": 1000
21
+ },
22
+ {
23
+ "epoch": 0.63,
24
+ "learning_rate": 1.2552301255230125e-05,
25
+ "loss": 0.6162,
26
+ "step": 1500
27
+ },
28
+ {
29
+ "epoch": 0.84,
30
+ "learning_rate": 1.6719665271966527e-05,
31
+ "loss": 0.5225,
32
+ "step": 2000
33
+ },
34
+ {
35
+ "epoch": 1.0,
36
+ "eval_accuracy": 0.8397642015005359,
37
+ "eval_f1": 0.7340330901974738,
38
+ "eval_precision": 0.8176773682124455,
39
+ "eval_recall": 0.6659134925758554,
40
+ "eval_runtime": 59.1615,
41
+ "eval_samples_per_second": 157.704,
42
+ "eval_steps_per_second": 13.15,
43
+ "step": 2390
44
+ },
45
+ {
46
+ "epoch": 1.05,
47
+ "learning_rate": 1.9899581589958163e-05,
48
+ "loss": 0.4783,
49
+ "step": 2500
50
+ },
51
+ {
52
+ "epoch": 1.26,
53
+ "learning_rate": 1.9434681543468157e-05,
54
+ "loss": 0.4164,
55
+ "step": 3000
56
+ },
57
+ {
58
+ "epoch": 1.46,
59
+ "learning_rate": 1.896978149697815e-05,
60
+ "loss": 0.4054,
61
+ "step": 3500
62
+ },
63
+ {
64
+ "epoch": 1.67,
65
+ "learning_rate": 1.8505811250581126e-05,
66
+ "loss": 0.3954,
67
+ "step": 4000
68
+ },
69
+ {
70
+ "epoch": 1.88,
71
+ "learning_rate": 1.8040911204091124e-05,
72
+ "loss": 0.4002,
73
+ "step": 4500
74
+ },
75
+ {
76
+ "epoch": 2.0,
77
+ "eval_accuracy": 0.9234726688102894,
78
+ "eval_f1": 0.8800403225806451,
79
+ "eval_precision": 0.9176594253679047,
80
+ "eval_recall": 0.8453841187863137,
81
+ "eval_runtime": 106.4194,
82
+ "eval_samples_per_second": 87.672,
83
+ "eval_steps_per_second": 7.311,
84
+ "step": 4780
85
+ },
86
+ {
87
+ "epoch": 2.09,
88
+ "learning_rate": 1.75769409576941e-05,
89
+ "loss": 0.3342,
90
+ "step": 5000
91
+ },
92
+ {
93
+ "epoch": 2.3,
94
+ "learning_rate": 1.7112040911204093e-05,
95
+ "loss": 0.2597,
96
+ "step": 5500
97
+ },
98
+ {
99
+ "epoch": 2.51,
100
+ "learning_rate": 1.6647140864714087e-05,
101
+ "loss": 0.2773,
102
+ "step": 6000
103
+ },
104
+ {
105
+ "epoch": 2.72,
106
+ "learning_rate": 1.6182240818224085e-05,
107
+ "loss": 0.2942,
108
+ "step": 6500
109
+ },
110
+ {
111
+ "epoch": 2.93,
112
+ "learning_rate": 1.571827057182706e-05,
113
+ "loss": 0.2799,
114
+ "step": 7000
115
+ },
116
+ {
117
+ "epoch": 3.0,
118
+ "eval_accuracy": 0.9638799571275456,
119
+ "eval_f1": 0.9449076344613373,
120
+ "eval_precision": 0.9572706194104008,
121
+ "eval_recall": 0.9328599096191091,
122
+ "eval_runtime": 105.0123,
123
+ "eval_samples_per_second": 88.847,
124
+ "eval_steps_per_second": 7.409,
125
+ "step": 7170
126
+ },
127
+ {
128
+ "epoch": 3.14,
129
+ "learning_rate": 1.5253370525337054e-05,
130
+ "loss": 0.2081,
131
+ "step": 7500
132
+ },
133
+ {
134
+ "epoch": 3.35,
135
+ "learning_rate": 1.478847047884705e-05,
136
+ "loss": 0.202,
137
+ "step": 8000
138
+ },
139
+ {
140
+ "epoch": 3.56,
141
+ "learning_rate": 1.4323570432357044e-05,
142
+ "loss": 0.1983,
143
+ "step": 8500
144
+ },
145
+ {
146
+ "epoch": 3.77,
147
+ "learning_rate": 1.385960018596002e-05,
148
+ "loss": 0.2074,
149
+ "step": 9000
150
+ },
151
+ {
152
+ "epoch": 3.97,
153
+ "learning_rate": 1.3394700139470016e-05,
154
+ "loss": 0.2099,
155
+ "step": 9500
156
+ },
157
+ {
158
+ "epoch": 4.0,
159
+ "eval_accuracy": 0.9785637727759914,
160
+ "eval_f1": 0.9672988881621974,
161
+ "eval_precision": 0.9801192842942346,
162
+ "eval_recall": 0.9548095545513234,
163
+ "eval_runtime": 142.4236,
164
+ "eval_samples_per_second": 65.509,
165
+ "eval_steps_per_second": 5.463,
166
+ "step": 9560
167
+ },
168
+ {
169
+ "epoch": 4.18,
170
+ "learning_rate": 1.2929800092980009e-05,
171
+ "loss": 0.1391,
172
+ "step": 10000
173
+ },
174
+ {
175
+ "epoch": 4.39,
176
+ "learning_rate": 1.2464900046490005e-05,
177
+ "loss": 0.1301,
178
+ "step": 10500
179
+ },
180
+ {
181
+ "epoch": 4.6,
182
+ "learning_rate": 1.2000929800092982e-05,
183
+ "loss": 0.1409,
184
+ "step": 11000
185
+ },
186
+ {
187
+ "epoch": 4.81,
188
+ "learning_rate": 1.1536029753602976e-05,
189
+ "loss": 0.1544,
190
+ "step": 11500
191
+ },
192
+ {
193
+ "epoch": 5.0,
194
+ "eval_accuracy": 0.9836012861736334,
195
+ "eval_f1": 0.9755552005112638,
196
+ "eval_precision": 0.9658335969629864,
197
+ "eval_recall": 0.9854744996772111,
198
+ "eval_runtime": 107.7483,
199
+ "eval_samples_per_second": 86.591,
200
+ "eval_steps_per_second": 7.221,
201
+ "step": 11950
202
+ },
203
+ {
204
+ "epoch": 5.02,
205
+ "learning_rate": 1.1071129707112971e-05,
206
+ "loss": 0.1549,
207
+ "step": 12000
208
+ },
209
+ {
210
+ "epoch": 5.23,
211
+ "learning_rate": 1.0606229660622967e-05,
212
+ "loss": 0.092,
213
+ "step": 12500
214
+ },
215
+ {
216
+ "epoch": 5.44,
217
+ "learning_rate": 1.014225941422594e-05,
218
+ "loss": 0.0973,
219
+ "step": 13000
220
+ },
221
+ {
222
+ "epoch": 5.65,
223
+ "learning_rate": 9.678289167828918e-06,
224
+ "loss": 0.122,
225
+ "step": 13500
226
+ },
227
+ {
228
+ "epoch": 5.86,
229
+ "learning_rate": 9.213389121338912e-06,
230
+ "loss": 0.1221,
231
+ "step": 14000
232
+ },
233
+ {
234
+ "epoch": 6.0,
235
+ "eval_accuracy": 0.9926045016077171,
236
+ "eval_f1": 0.9888799355358582,
237
+ "eval_precision": 0.9874476987447699,
238
+ "eval_recall": 0.9903163331181407,
239
+ "eval_runtime": 135.6226,
240
+ "eval_samples_per_second": 68.794,
241
+ "eval_steps_per_second": 5.737,
242
+ "step": 14340
243
+ },
244
+ {
245
+ "epoch": 6.07,
246
+ "learning_rate": 8.748489074848908e-06,
247
+ "loss": 0.1079,
248
+ "step": 14500
249
+ },
250
+ {
251
+ "epoch": 6.28,
252
+ "learning_rate": 8.283589028358903e-06,
253
+ "loss": 0.0886,
254
+ "step": 15000
255
+ },
256
+ {
257
+ "epoch": 6.49,
258
+ "learning_rate": 7.818688981868899e-06,
259
+ "loss": 0.0816,
260
+ "step": 15500
261
+ },
262
+ {
263
+ "epoch": 6.69,
264
+ "learning_rate": 7.354718735471874e-06,
265
+ "loss": 0.0723,
266
+ "step": 16000
267
+ },
268
+ {
269
+ "epoch": 6.9,
270
+ "learning_rate": 6.889818688981869e-06,
271
+ "loss": 0.0941,
272
+ "step": 16500
273
+ },
274
+ {
275
+ "epoch": 7.0,
276
+ "eval_accuracy": 0.9945337620578778,
277
+ "eval_f1": 0.9917966865047451,
278
+ "eval_precision": 0.9884578390509778,
279
+ "eval_recall": 0.9951581665590704,
280
+ "eval_runtime": 105.7234,
281
+ "eval_samples_per_second": 88.249,
282
+ "eval_steps_per_second": 7.359,
283
+ "step": 16730
284
+ },
285
+ {
286
+ "epoch": 7.11,
287
+ "learning_rate": 6.424918642491865e-06,
288
+ "loss": 0.066,
289
+ "step": 17000
290
+ },
291
+ {
292
+ "epoch": 7.32,
293
+ "learning_rate": 5.96001859600186e-06,
294
+ "loss": 0.0537,
295
+ "step": 17500
296
+ },
297
+ {
298
+ "epoch": 7.53,
299
+ "learning_rate": 5.496048349604835e-06,
300
+ "loss": 0.0484,
301
+ "step": 18000
302
+ },
303
+ {
304
+ "epoch": 7.74,
305
+ "learning_rate": 5.031148303114831e-06,
306
+ "loss": 0.0544,
307
+ "step": 18500
308
+ },
309
+ {
310
+ "epoch": 7.95,
311
+ "learning_rate": 4.566248256624826e-06,
312
+ "loss": 0.0673,
313
+ "step": 19000
314
+ },
315
+ {
316
+ "epoch": 8.0,
317
+ "eval_accuracy": 0.9965702036441586,
318
+ "eval_f1": 0.9948403740728798,
319
+ "eval_precision": 0.9938788659793815,
320
+ "eval_recall": 0.9958037443511943,
321
+ "eval_runtime": 107.9506,
322
+ "eval_samples_per_second": 86.428,
323
+ "eval_steps_per_second": 7.207,
324
+ "step": 19120
325
+ },
326
+ {
327
+ "epoch": 8.16,
328
+ "learning_rate": 4.101348210134822e-06,
329
+ "loss": 0.035,
330
+ "step": 19500
331
+ },
332
+ {
333
+ "epoch": 8.37,
334
+ "learning_rate": 3.636448163644817e-06,
335
+ "loss": 0.0337,
336
+ "step": 20000
337
+ },
338
+ {
339
+ "epoch": 8.58,
340
+ "learning_rate": 3.1724779172477922e-06,
341
+ "loss": 0.0397,
342
+ "step": 20500
343
+ },
344
+ {
345
+ "epoch": 8.79,
346
+ "learning_rate": 2.707577870757787e-06,
347
+ "loss": 0.0398,
348
+ "step": 21000
349
+ },
350
+ {
351
+ "epoch": 9.0,
352
+ "learning_rate": 2.242677824267783e-06,
353
+ "loss": 0.0513,
354
+ "step": 21500
355
+ },
356
+ {
357
+ "epoch": 9.0,
358
+ "eval_accuracy": 0.9980707395498393,
359
+ "eval_f1": 0.9970939618986115,
360
+ "eval_precision": 0.9974160206718347,
361
+ "eval_recall": 0.9967721110393802,
362
+ "eval_runtime": 106.5837,
363
+ "eval_samples_per_second": 87.537,
364
+ "eval_steps_per_second": 7.299,
365
+ "step": 21510
366
+ },
367
+ {
368
+ "epoch": 9.21,
369
+ "learning_rate": 1.777777777777778e-06,
370
+ "loss": 0.0219,
371
+ "step": 22000
372
+ },
373
+ {
374
+ "epoch": 9.41,
375
+ "learning_rate": 1.3128777312877733e-06,
376
+ "loss": 0.029,
377
+ "step": 22500
378
+ },
379
+ {
380
+ "epoch": 9.62,
381
+ "learning_rate": 8.489074848907486e-07,
382
+ "loss": 0.0274,
383
+ "step": 23000
384
+ },
385
+ {
386
+ "epoch": 9.83,
387
+ "learning_rate": 3.8400743840074387e-07,
388
+ "loss": 0.0308,
389
+ "step": 23500
390
+ },
391
+ {
392
+ "epoch": 10.0,
393
+ "eval_accuracy": 0.9986066452304394,
394
+ "eval_f1": 0.9979028875625101,
395
+ "eval_precision": 0.9974201870364399,
396
+ "eval_recall": 0.9983860555196902,
397
+ "eval_runtime": 106.9592,
398
+ "eval_samples_per_second": 87.23,
399
+ "eval_steps_per_second": 7.274,
400
+ "step": 23900
401
+ }
402
+ ],
403
+ "max_steps": 23900,
404
+ "num_train_epochs": 10,
405
+ "total_flos": 7.54576195666944e+16,
406
+ "trial_name": null,
407
+ "trial_params": null
408
+ }
outputs/sketch/checkpoint-23900/training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c0f222ab026cf85beb8d17adfee575acbc981e16c71caead2903f720c9273d20
3
+ size 3311
outputs/sketch/checkpoint-23900/vocab.txt ADDED
The diff for this file is too large to render. See raw diff
 
outputs/sketch/cls_score.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ {
2
+ "id-01": 10.994331359863281
3
+ }
requirements.txt ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ transformers==4.21.1
2
+ wandb==0.13.3
3
+ torch==1.12.1
4
+ datasets
5
+ sklearn
6
+ streamlit==1.2.0
retro_reader/__init__.py ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ from .retro_reader import RetroReader
2
+
3
+ __all__ = ["constants", "retro_reader", "args"]
retro_reader/__pycache__/__init__.cpython-310.pyc ADDED
Binary file (275 Bytes). View file
 
retro_reader/__pycache__/__init__.cpython-37.pyc ADDED
Binary file (269 Bytes). View file
 
retro_reader/__pycache__/__init__.cpython-38.pyc ADDED
Binary file (273 Bytes). View file
 
retro_reader/__pycache__/base.cpython-310.pyc ADDED
Binary file (5.52 kB). View file
 
retro_reader/__pycache__/base.cpython-37.pyc ADDED
Binary file (5.39 kB). View file
 
retro_reader/__pycache__/base.cpython-38.pyc ADDED
Binary file (5.46 kB). View file
 
retro_reader/__pycache__/constants.cpython-310.pyc ADDED
Binary file (5.22 kB). View file
 
retro_reader/__pycache__/constants.cpython-37.pyc ADDED
Binary file (5.03 kB). View file
 
retro_reader/__pycache__/constants.cpython-38.pyc ADDED
Binary file (5.21 kB). View file
 
retro_reader/__pycache__/metrics.cpython-310.pyc ADDED
Binary file (1.55 kB). View file
 
retro_reader/__pycache__/metrics.cpython-37.pyc ADDED
Binary file (1.54 kB). View file
 
retro_reader/__pycache__/metrics.cpython-38.pyc ADDED
Binary file (1.55 kB). View file
 
retro_reader/__pycache__/preprocess.cpython-310.pyc ADDED
Binary file (5.47 kB). View file
 
retro_reader/__pycache__/preprocess.cpython-37.pyc ADDED
Binary file (5.67 kB). View file
 
retro_reader/__pycache__/preprocess.cpython-38.pyc ADDED
Binary file (5.41 kB). View file
 
retro_reader/__pycache__/retro_reader.cpython-310.pyc ADDED
Binary file (16.2 kB). View file
 
retro_reader/__pycache__/retro_reader.cpython-37.pyc ADDED
Binary file (16.1 kB). View file
 
retro_reader/__pycache__/retro_reader.cpython-38.pyc ADDED
Binary file (16.3 kB). View file