dennlinger
commited on
Commit
•
26f1aa0
1
Parent(s):
5aab424
Update YAML and add link to dataset.
Browse files
README.md
CHANGED
@@ -1,9 +1,23 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
# BERT-Wiki-Paragraphs
|
2 |
|
3 |
Authors: Satya Almasian\*, Dennis Aumiller\*, Lucienne-Sophie Marmé, Michael Gertz
|
4 |
Contact us at `<lastname>@informatik.uni-heidelberg.de`
|
5 |
Details for the training method can be found in our work [Structural Text Segmentation of Legal Documents](https://arxiv.org/abs/2012.03619).
|
6 |
The training procedure follows the same setup, but we substitute legal documents for Wikipedia in this model.
|
|
|
7 |
|
8 |
Training is performed in a form of weakly-supervised fashion to determine whether paragraphs topically belong together or not.
|
9 |
We utilize automatically generated samples from Wikipedia for training, where paragraphs from within the same section are assumed to be topically coherent.
|
|
|
1 |
+
---
|
2 |
+
language:
|
3 |
+
- en
|
4 |
+
tags:
|
5 |
+
- sentence-similarity
|
6 |
+
- text-classification
|
7 |
+
datasets:
|
8 |
+
- wiki-paragraphs
|
9 |
+
metrics:
|
10 |
+
- f1
|
11 |
+
license: mit
|
12 |
+
---
|
13 |
+
|
14 |
# BERT-Wiki-Paragraphs
|
15 |
|
16 |
Authors: Satya Almasian\*, Dennis Aumiller\*, Lucienne-Sophie Marmé, Michael Gertz
|
17 |
Contact us at `<lastname>@informatik.uni-heidelberg.de`
|
18 |
Details for the training method can be found in our work [Structural Text Segmentation of Legal Documents](https://arxiv.org/abs/2012.03619).
|
19 |
The training procedure follows the same setup, but we substitute legal documents for Wikipedia in this model.
|
20 |
+
Find the associated training data here: [wiki-paragraphs](https://huggingface.co/datasets/dennlinger/wiki-paragraphs)
|
21 |
|
22 |
Training is performed in a form of weakly-supervised fashion to determine whether paragraphs topically belong together or not.
|
23 |
We utilize automatically generated samples from Wikipedia for training, where paragraphs from within the same section are assumed to be topically coherent.
|