dennlinger commited on
Commit
26f1aa0
1 Parent(s): 5aab424

Update YAML and add link to dataset.

Browse files
Files changed (1) hide show
  1. README.md +14 -0
README.md CHANGED
@@ -1,9 +1,23 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  # BERT-Wiki-Paragraphs
2
 
3
  Authors: Satya Almasian\*, Dennis Aumiller\*, Lucienne-Sophie Marmé, Michael Gertz
4
  Contact us at `<lastname>@informatik.uni-heidelberg.de`
5
  Details for the training method can be found in our work [Structural Text Segmentation of Legal Documents](https://arxiv.org/abs/2012.03619).
6
  The training procedure follows the same setup, but we substitute legal documents for Wikipedia in this model.
 
7
 
8
  Training is performed in a form of weakly-supervised fashion to determine whether paragraphs topically belong together or not.
9
  We utilize automatically generated samples from Wikipedia for training, where paragraphs from within the same section are assumed to be topically coherent.
 
1
+ ---
2
+ language:
3
+ - en
4
+ tags:
5
+ - sentence-similarity
6
+ - text-classification
7
+ datasets:
8
+ - wiki-paragraphs
9
+ metrics:
10
+ - f1
11
+ license: mit
12
+ ---
13
+
14
  # BERT-Wiki-Paragraphs
15
 
16
  Authors: Satya Almasian\*, Dennis Aumiller\*, Lucienne-Sophie Marmé, Michael Gertz
17
  Contact us at `<lastname>@informatik.uni-heidelberg.de`
18
  Details for the training method can be found in our work [Structural Text Segmentation of Legal Documents](https://arxiv.org/abs/2012.03619).
19
  The training procedure follows the same setup, but we substitute legal documents for Wikipedia in this model.
20
+ Find the associated training data here: [wiki-paragraphs](https://huggingface.co/datasets/dennlinger/wiki-paragraphs)
21
 
22
  Training is performed in a form of weakly-supervised fashion to determine whether paragraphs topically belong together or not.
23
  We utilize automatically generated samples from Wikipedia for training, where paragraphs from within the same section are assumed to be topically coherent.