Spaces:

somosnlp-hackathon-2022
/

readability-assessment-spanish

Build error

App Files Files Community

feralvam commited on Apr 3, 2022

Commit

0352b62

•

1 Parent(s): b8feb8c

Update app.py

Browse files

Files changed (1) hide show

app.py +4 -4

app.py CHANGED Viewed

@@ -26,7 +26,7 @@ We aim to contribute to the development of **neural models for readability asses
 We curated a new dataset that combines existing corpora for readability assessment (i.e. [Newsela](https://newsela.com/data)) and texts scraped from webpages aimed at learners of Spanish as a second language. Texts in the Newsela corpus contain the grade level (according to the USA educational system) that they were written for. In the case of scraped texts, we selected webpages that explicitly indicated the [CEFR](https://en.wikipedia.org/wiki/Common_European_Framework_of_Reference_for_Languages) level that each text belongs to.
-Each text has two readability labels, according to the following mapping:
 |                  | 2-class      |              | 3-class         |                 |                  |
 |------------------|--------------|--------------|-----------------|-----------------|------------------|
@@ -36,14 +36,14 @@ Each text has two readability labels, according to the following mapping:
 In addition, texts in the dataset could be too long to fit in a model. As such, we created two versions of the dataset, dividing each text into [sentences](https://huggingface.co/datasets/hackathon-pln-es/readability-es-sentences) and [paragraphs](https://huggingface.co/datasets/hackathon-pln-es/readability-es-paragraphs).
-We also scraped several texts from the ["Corpus de Aprendices del Español" (CAES)](http://galvan.usc.es/caes/). However, due to the time constraints, we leave experiments with it for future work. The data is available [here](https://huggingface.co/datasets/hackathon-pln-es/readability-es-caes).
 ### Models
 Our models are based on [BERTIN](https://huggingface.co/bertin-project). We fine-tuned [bertin-roberta-base-spanish](https://huggingface.co/bertin-project/bertin-roberta-base-spanish) in the different versions of our collected dataset. The following models are available:
-- [2-class sentence-level ](https://huggingface.co/hackathon-pln-es/readability-es-sentences)
-- [2-class paragraph-level ](https://huggingface.co/hackathon-pln-es/readability-es-paragraphs)
 - [3-class sentence-level](https://huggingface.co/hackathon-pln-es/readability-es-3class-sentences)
 - [3-class paragraph-level](https://huggingface.co/hackathon-pln-es/readability-es-3class-paragraphs)

 We curated a new dataset that combines existing corpora for readability assessment (i.e. [Newsela](https://newsela.com/data)) and texts scraped from webpages aimed at learners of Spanish as a second language. Texts in the Newsela corpus contain the grade level (according to the USA educational system) that they were written for. In the case of scraped texts, we selected webpages that explicitly indicated the [CEFR](https://en.wikipedia.org/wiki/Common_European_Framework_of_Reference_for_Languages) level that each text belongs to.
+In our dataset, each text has two readability labels, according to the following mapping:
 |                  | 2-class      |              | 3-class         |                 |                  |
 |------------------|--------------|--------------|-----------------|-----------------|------------------|
 In addition, texts in the dataset could be too long to fit in a model. As such, we created two versions of the dataset, dividing each text into [sentences](https://huggingface.co/datasets/hackathon-pln-es/readability-es-sentences) and [paragraphs](https://huggingface.co/datasets/hackathon-pln-es/readability-es-paragraphs).
+We also scraped several texts from the ["Corpus de Aprendices del Español" (CAES)](http://galvan.usc.es/caes/). However, due to the time constraints, we leave experiments with it for future work. This data is available [here](https://huggingface.co/datasets/hackathon-pln-es/readability-es-caes).
 ### Models
 Our models are based on [BERTIN](https://huggingface.co/bertin-project). We fine-tuned [bertin-roberta-base-spanish](https://huggingface.co/bertin-project/bertin-roberta-base-spanish) in the different versions of our collected dataset. The following models are available:
+- [2-class sentence-level](https://huggingface.co/hackathon-pln-es/readability-es-sentences)
+- [2-class paragraph-level](https://huggingface.co/hackathon-pln-es/readability-es-paragraphs)
 - [3-class sentence-level](https://huggingface.co/hackathon-pln-es/readability-es-3class-sentences)
 - [3-class paragraph-level](https://huggingface.co/hackathon-pln-es/readability-es-3class-paragraphs)