kellywong commited on
Commit
a63e9c5
1 Parent(s): da396d2

update README.md

Browse files
Files changed (1) hide show
  1. README.md +237 -0
README.md CHANGED
@@ -1,3 +1,240 @@
1
  ---
 
2
  license: cc-by-nc-sa-4.0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ language: en
3
  license: cc-by-nc-sa-4.0
4
+ datasets:
5
+ - RST-Discourse-Treebank
6
+ tags:
7
+ - rst-pointer
8
+ - feature-extraction
9
+ inference: false
10
+ model-index:
11
+ - name: RST Pointer
12
+ results:
13
+ - task:
14
+ type: feature-extraction
15
+ name: RST-Pointer
16
+ dataset:
17
+ name: Segmenter model test results (Trained)
18
+ type: evaluation dataset
19
+ metrics:
20
+ - name: Precision
21
+ type: precision
22
+ value: 0.939
23
+ - task:
24
+ type: feature-extraction
25
+ name: RST-Pointer
26
+ dataset:
27
+ name: Segmenter model test results (Reported)
28
+ type: evaluation dataset
29
+ metrics:
30
+ - name: Precision
31
+ type: precision
32
+ value: 0.941
33
+ - task:
34
+ type: feature-extraction
35
+ name: RST-Pointer
36
+ dataset:
37
+ name: Segmenter model test results (Trained)
38
+ type: evaluation dataset
39
+ metrics:
40
+ - name: Recall
41
+ type: recall
42
+ value: 0.979
43
+ - task:
44
+ type: feature-extraction
45
+ name: RST-Pointer
46
+ dataset:
47
+ name: Segmenter model test results (Reported)
48
+ type: evaluation dataset
49
+ metrics:
50
+ - name: Recall
51
+ type: recall
52
+ value: 0.966
53
+ - task:
54
+ type: feature-extraction
55
+ name: RST-Pointer
56
+ dataset:
57
+ name: Segmenter model test results (Trained)
58
+ type: evaluation dataset
59
+ metrics:
60
+ - name: F1
61
+ type: f1
62
+ value: 0.959
63
+ - task:
64
+ type: feature-extraction
65
+ name: RST-Pointer
66
+ dataset:
67
+ name: Segmenter model test results (Reported)
68
+ type: evaluation dataset
69
+ metrics:
70
+ - name: F1
71
+ type: f1
72
+ value: 0.954
73
+ - task:
74
+ type: feature-extraction
75
+ name: RST-Pointer
76
+ dataset:
77
+ name: Parser model test results (Trained)
78
+ type: evaluation dataset
79
+ metrics:
80
+ - name: F1 Relation
81
+ type: relation
82
+ value: 0.813
83
+ - task:
84
+ type: feature-extraction
85
+ name: RST-Pointer
86
+ dataset:
87
+ name: Parser model test results (Reported)
88
+ type: evaluation dataset
89
+ metrics:
90
+ - name: F1 Relation
91
+ type: relation
92
+ value: 0.813
93
+ - task:
94
+ type: feature-extraction
95
+ name: RST-Pointer
96
+ dataset:
97
+ name: Parser model test results (Trained)
98
+ type: evaluation dataset
99
+ metrics:
100
+ - name: F1 Span
101
+ type: span
102
+ value: 0.966
103
+ - task:
104
+ type: feature-extraction
105
+ name: RST-Pointer
106
+ dataset:
107
+ name: Parser model test results (Reported)
108
+ type: evaluation dataset
109
+ metrics:
110
+ - name: F1 Span
111
+ type: span
112
+ value: 0.969
113
+ - task:
114
+ type: feature-extraction
115
+ name: RST-Pointer
116
+ dataset:
117
+ name: Parser model test results (Trained)
118
+ type: evaluation dataset
119
+ metrics:
120
+ - name: F1 Nuclearity
121
+ type: nuclearity
122
+ value: 0.909
123
+ - task:
124
+ type: feature-extraction
125
+ name: RST-Pointer
126
+ dataset:
127
+ name: Parser model test results (Reported)
128
+ type: evaluation dataset
129
+ metrics:
130
+ - name: F1 Nuclearity
131
+ type: nuclearity
132
+ value: 0.909
133
  ---
134
+
135
+ # Coherence Modelling
136
+ You can **test the model** at [SGNLP](https://sgnlp.aisingapore.net/discourse-parsing).<br />
137
+ If you want to find out more information, please contact us at sg-nlp@aisingapore.org.
138
+
139
+ ## Table of Contents
140
+ - [Model Details](#model-details)
141
+ - [How to Get Started With the Model](#how-to-get-started-with-the-model)
142
+ - [Training](#training)
143
+ - [Model Parameters](#parameters)
144
+ - [Other Information](#other-information)
145
+ - [License](#license)
146
+
147
+ ## Model Details
148
+ **Model Name:** RST-Pointer
149
+ - **Description:** This is a pointer network-based segmenter and parser that is trained to identify the relations between different sections of a sentence according to rhetorical structure theory (RST).
150
+ - **Paper:** A Unified Linear-Time Framework for Sentence-Level Discourse Parsing. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, July 2019 (pp. 4190-4200).
151
+ - **Author(s):** Lin, X., Joty, S., Jwalapuram, P., & Bari, M. S. (2019).
152
+ - **URL:** https://aclanthology.org/P19-1410/
153
+
154
+ # How to Get Started With the Model
155
+
156
+ ## Install Python package
157
+ SGnlp is an initiative by AI Singapore's NLP Hub. They aim to bridge the gap between research and industry, promote translational research, and encourage adoption of NLP techniques in the industry. <br><br> Various NLP models, other than aspect sentiment analysis are available in the python package. You can try them out at [SGNLP-Demo](https://sgnlp.aisingapore.net/) | [SGNLP-Github](https://github.com/aisingapore/sgnlp).
158
+
159
+ ```python
160
+ pip install sgnlp
161
+
162
+ ```
163
+
164
+ ## Examples
165
+ For more full code (such as RST-Pointer), please refer to this [github](https://github.com/aisingapore/sgnlp). <br> Alternatively, you can also try out the [demo](https://sgnlp.aisingapore.net/discourse-parsing) for Discourse-Parsing.
166
+
167
+ Example of RST-Pointer modelling on Discourse Parsing:
168
+ ```python
169
+ from sgnlp.models.rst_pointer import (
170
+ RstPointerParserConfig,
171
+ RstPointerParserModel,
172
+ RstPointerSegmenterConfig,
173
+ RstPointerSegmenterModel,
174
+ RstPreprocessor,
175
+ RstPostprocessor
176
+ )
177
+
178
+ # Load processors and models
179
+ preprocessor = RstPreprocessor()
180
+ postprocessor = RstPostprocessor()
181
+
182
+ segmenter_config = RstPointerSegmenterConfig.from_pretrained(
183
+ 'https://storage.googleapis.com/sgnlp/models/rst_pointer/segmenter/config.json')
184
+ segmenter = RstPointerSegmenterModel.from_pretrained(
185
+ 'https://storage.googleapis.com/sgnlp/models/rst_pointer/segmenter/pytorch_model.bin',
186
+ config=segmenter_config)
187
+ segmenter.eval()
188
+
189
+ parser_config = RstPointerParserConfig.from_pretrained(
190
+ 'https://storage.googleapis.com/sgnlp/models/rst_pointer/parser/config.json')
191
+ parser = RstPointerParserModel.from_pretrained(
192
+ 'https://storage.googleapis.com/sgnlp/models/rst_pointer/parser/pytorch_model.bin',
193
+ config=parser_config)
194
+ parser.eval()
195
+
196
+ sentences = [
197
+ "Thumbs began to be troublesome about 4 months ago and I made an appointment with the best hand surgeon in the "
198
+ "Valley to see if my working activities were the problem.",
199
+ "Every rule has exceptions, but the tragic and too-common tableaux of hundreds or even thousands of people "
200
+ "snake-lining up for any task with a paycheck illustrates a lack of jobs, not laziness."
201
+ ]
202
+
203
+ tokenized_sentences_ids, tokenized_sentences, lengths = preprocessor(sentences)
204
+
205
+ segmenter_output = segmenter(tokenized_sentences_ids, lengths)
206
+ end_boundaries = segmenter_output.end_boundaries
207
+
208
+ parser_output = parser(tokenized_sentences_ids, end_boundaries, lengths)
209
+
210
+ trees = postprocessor(sentences=sentences, tokenized_sentences=tokenized_sentences,
211
+ end_boundaries=end_boundaries,
212
+ discourse_tree_splits=parser_output.splits)
213
+
214
+
215
+ ```
216
+
217
+
218
+ # Training
219
+ The dataset (RST Discourse Treebank) that the model is trained on is a licensed dataset.
220
+ - **Training Config:** [Segmenter](https://storage.googleapis.com/sgnlp/models/rst_pointer/segmenter/training_config.json) | [Parser](https://storage.googleapis.com/sgnlp/models/rst_pointer/parser/training_config.json)
221
+
222
+ #### Training Results
223
+ - **Training Time (Segmenter):** ~2 hours for 100 epochs on a single V100 GPU for segmenter model.
224
+ - **Training Time (Parser):** ~6 hours for 200 epochs on a single V100 GPU for parser model
225
+
226
+ # Model Parameters
227
+ - **Model Weights:** [Segmenter](https://storage.googleapis.com/sgnlp/models/rst_pointer/segmenter/pytorch_model.bin) | [Parser](https://storage.googleapis.com/sgnlp/models/rst_pointer/parser/pytorch_model.bin)
228
+ - **Model Config:** [Segmenter](https://storage.googleapis.com/sgnlp/models/rst_pointer/segmenter/config.json) | [Parser](https://storage.googleapis.com/sgnlp/models/rst_pointer/parser/config.json)
229
+ - **Model Inputs:** A sentence.
230
+ - **Model Outputs:** Discourse parsed tree.
231
+ - **Model Size:** ~362MB for segmenter model, ~361MB for parser model
232
+ - **Model Inference Info:** Not available.
233
+ - **Usage Scenarios:** Construct additional features for downstream NLP tasks.
234
+
235
+ # Other Information
236
+ - **Original Code:** [link](https://github.com/shawnlimn/UnifiedParser_RST)
237
+
238
+ # License
239
+ - **Model:** Released under [CC BY-NC-SA 4.0](https://creativecommons.org/licenses/by-nc-sa/4.0/)
240
+ - **Code:** Released under [MIT License](https://choosealicense.com/licenses/mit)