PleIAs
/

Segmentext

Pclanglais commited on Jun 28

Commit

013b723

•

1 Parent(s): a2d551d

Update README.md

Files changed (1) hide show

README.md CHANGED Viewed

@@ -6,7 +6,10 @@ Estienne was trained on 2,000 example of manually annotated texts, excerpted at
 Given the diversity of the corpus, Estienne should work out on diverse document formats in European languages.
-As Deberta remove newline by default and has no support for it in the tokenizer, they should be replaced by pilcrows (¶)
 Estienne supports the following segmentations:
 * **Text**
@@ -21,4 +24,4 @@ Estienne supports the following segmentations:
 * **Date** - statement of date and time, common in letters and newspaper articles.
 * **Keyword** - list of keywords, especially common in scientific publications.
-The model is named in reference to the humanist Henri Estienne who introduced many practices of text segmentation still in use in scholarly edition today.

 Given the diversity of the corpus, Estienne should work out on diverse document formats in European languages.
+The model is named in reference to the humanist Henri Estienne who introduced many practices of text segmentation still in use in scholarly edition today.
+## Use
+As Deberta remove newline by default and has no support for it in the tokenizer, they should be replaced by pilcrows (¶).
 Estienne supports the following segmentations:
 * **Text**
 * **Date** - statement of date and time, common in letters and newspaper articles.
 * **Keyword** - list of keywords, especially common in scientific publications.
+## Example