qanastek commited on
Commit
dc5558b
1 Parent(s): 9805133

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +5 -6
README.md CHANGED
@@ -10,10 +10,9 @@ widget:
10
 
11
  # POET: A French Extended Part-of-Speech Tagger
12
 
13
- - Corpora: [UD_FRENCH_TREEBANKS](https://github.com/qanastek/UD_FRENCH_GSD_PLUS)
14
- - Model: [Flair](https://www.aclweb.org/anthology/C18-1139.pdf)
15
  - Embeddings: [FastText](https://fasttext.cc/)
16
- - Sequence Labelling: [LSTM-CRF](https://arxiv.org/abs/1011.4088)
17
  - Number of Epochs: 115
18
 
19
  **People Involved**
@@ -52,13 +51,13 @@ Output:
52
 
53
  ## Training data
54
 
55
- `UD_FRENCH_GSD_Plus` is a part-of-speech tagging corpora based on [UD_French-GSD](https://universaldependencies.org/treebanks/fr_gsd/index.html) which was originally created in 2015 and is based on the [universal dependency treebank v2.0](https://github.com/ryanmcd/uni-dep-tb).
56
 
57
  Originally, the corpora consists of 400,399 words (16,341 sentences) and had 17 different classes. Now, after applying our tags augmentation we obtain 60 different classes which add linguistic and semantic information such as the gender, number, mood, person, tense or verb form given in the different CoNLL-03 fields from the original corpora.
58
 
59
  We based our tags on the level of details given by the [LIA_TAGG](http://pageperso.lif.univ-mrs.fr/frederic.bechet/download.html) statistical POS tagger written by [Frédéric Béchet](http://pageperso.lif.univ-mrs.fr/frederic.bechet/index-english.html) in 2001.
60
 
61
- The corpora used for this model is available on [Github](https://github.com/qanastek/UD_FRENCH_GSD_PLUS) at the [CoNLL-U format](https://universaldependencies.org/format.html).
62
 
63
  Training data are fed to the model as free language and doesn't pass a normalization phase. Thus, it's made the model case and punctuation sensitive.
64
 
@@ -138,7 +137,7 @@ PRON VERB SCONJ ADP CCONJ DET NOUN ADJ AUX ADV PUNCT PROPN NUM SYM PART X INTJ
138
 
139
  ## Evaluation results
140
 
141
- The test corpora used for this evaluation is available on [Github](https://github.com/qanastek/UD_FRENCH_GSD_PLUS/blob/main/UD_FRENCH_GSD_PLUS/fr_gsd-ud-plus-test.conllu).
142
 
143
  ```plain
144
  Results:
 
10
 
11
  # POET: A French Extended Part-of-Speech Tagger
12
 
13
+ - Corpora: [ANTILLES](https://github.com/qanastek/ANTILLES)
 
14
  - Embeddings: [FastText](https://fasttext.cc/)
15
+ - Sequence Labelling: [Bi-LSTM-CRF](https://arxiv.org/abs/1011.4088)
16
  - Number of Epochs: 115
17
 
18
  **People Involved**
 
51
 
52
  ## Training data
53
 
54
+ `ANTILLES` is a part-of-speech tagging corpora based on [UD_French-GSD](https://universaldependencies.org/treebanks/fr_gsd/index.html) which was originally created in 2015 and is based on the [universal dependency treebank v2.0](https://github.com/ryanmcd/uni-dep-tb).
55
 
56
  Originally, the corpora consists of 400,399 words (16,341 sentences) and had 17 different classes. Now, after applying our tags augmentation we obtain 60 different classes which add linguistic and semantic information such as the gender, number, mood, person, tense or verb form given in the different CoNLL-03 fields from the original corpora.
57
 
58
  We based our tags on the level of details given by the [LIA_TAGG](http://pageperso.lif.univ-mrs.fr/frederic.bechet/download.html) statistical POS tagger written by [Frédéric Béchet](http://pageperso.lif.univ-mrs.fr/frederic.bechet/index-english.html) in 2001.
59
 
60
+ The corpora used for this model is available on [Github](https://github.com/qanastek/ANTILLES) at the [CoNLL-U format](https://universaldependencies.org/format.html).
61
 
62
  Training data are fed to the model as free language and doesn't pass a normalization phase. Thus, it's made the model case and punctuation sensitive.
63
 
 
137
 
138
  ## Evaluation results
139
 
140
+ The test corpora used for this evaluation is available on [Github](https://github.com/qanastek/ANTILLES/blob/main/ANTILLES/test.conllu).
141
 
142
  ```plain
143
  Results: