Spaces:

Emanuel
/

porttagger

Running

App Files Files Community

Emanuel Huber commited on Dec 8, 2022

Commit

c541339

•

1 Parent(s): f48f476

Updated project description

Browse files

Files changed (1) hide show

top.html +16 -10

top.html CHANGED Viewed

@@ -3,17 +3,23 @@
         <h1 style="font-weight: 900; font-size: 3rem; margin: 20px;">
             Porttagger
         </h1>
-        <p class="slogan">A Brazilian Portuguese part-of-speech tagger according to Universal
-            Dependencies</p>
     </div>
     <p style="margin-top: 30px; margin-bottom: 10px; font-size: 94%; text-align: left;">
-        Porttagger (Porttinari Part-Of-Speech) tagger was trained on the <a
-            href="https://sites.google.com/icmc.usp.br/poetisa/resources-and-tools">Porttinari-base</a> corpus which is
-        a collection of news extracted from the Folha de São Paulo newspaper site. The trained model is a fine-tuned
-        version
-        of <a href="https://huggingface.co/neuralmind/bert-base-portuguese-cased">Bertimbau</a> that receives tokens and
-        outputs part-of-speech tags. Since the model expects a sequence of
-        tokens
-        for its inputs, <a src="https://spacy.io/models/pt">Spacy's</a> tokenization is used to tokenize the input text.
     </p>
 </div>

         <h1 style="font-weight: 900; font-size: 3rem; margin: 20px;">
             Porttagger
         </h1>
+        <p class="slogan">A Brazilian Portuguese part of speech tagger according to the <a
+                href="https://universaldependencies.org/">Universal Dependencies</a> model
+        </p>
     </div>
     <p style="margin-top: 30px; margin-bottom: 10px; font-size: 94%; text-align: left;">
+        Porttagger is a state of the art part of speech tagger for Brazilian Portuguese that automatically assigns
+        morphosyntactic classes to the words of sentences, following the Universal Dependencies international model. You
+        may provide single sentences or multiple sentences (using plain text files with several sentences) to be tagged.
+        You may also choose which trained model to use. The options include a model trained on news texts (using the
+        <a href="https://sites.google.com/icmc.usp.br/poetisa/resources-and-tools">Porttinari-base</a> corpus), on stock
+        market tweets (from the <a
+            href="https://www.kaggle.com/datasets/fernandojvdasilva/stock-tweets-ptbr-emotions">DANTE</a> corpus), on
+        academic texts from the oil & gas
+        domain (from the <a
+            href="https://github.com/UniversalDependencies/UD_Portuguese-PetroGold/blob/master/README.md">PetroGold</a>
+        corpus), and on all of them together. To the interested reader, this initiative is
+        part of the <a href="https://sites.google.com/icmc.usp.br/poetisa/">POeTiSA</a> project, where much more
+        information is available.
     </p>
 </div>