feralvam commited on
Commit
e2f76e0
1 Parent(s): 4b45916

Update app.py

Browse files
Files changed (1) hide show
  1. app.py +27 -4
app.py CHANGED
@@ -22,16 +22,39 @@ As such, developing models that could estimate a text's readability by "looking
22
  We aim to contribute to the development of **neural models for readability assessment for Spanish**, following previous work for [English](https://aclanthology.org/2021.cl-1.6/) and [Filipino](https://aclanthology.org/2021.ranlp-1.69/).
23
 
24
 
25
- ### More Information
26
 
27
- Details about how we trained these models can be found in our [report](https://wandb.ai/readability-es/readability-es/reports/Texts-Readability-Analysis-for-Spanish--VmlldzoxNzU2MDUx).
28
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
29
 
30
  ### Team
31
 
32
  - [Laura Vásquez-Rodríguez](https://lmvasque.github.io/)
33
- - Pedro Cuenca
34
- - Sergio Morales
35
  - [Fernando Alva-Manchego](https://feralvam.github.io/)
36
 
37
  """
 
22
  We aim to contribute to the development of **neural models for readability assessment for Spanish**, following previous work for [English](https://aclanthology.org/2021.cl-1.6/) and [Filipino](https://aclanthology.org/2021.ranlp-1.69/).
23
 
24
 
25
+ ### Dataset
26
 
27
+ We curated a new dataset that combines existing corpora for readability assessment (i.e. [Newsela](https://newsela.com/data)) and texts scraped from webpages aimed at learners of Spanish as a second language. Texts in the Newsela corpus contain the grade level (according to the USA educational system) that they were written for. In the case of scraped texts, we selected webpages that explicitly indicated the [CEFR](https://en.wikipedia.org/wiki/Common_European_Framework_of_Reference_for_Languages) level that each text belongs to.
28
 
29
+ Each text has two readability labels, according to the following mapping:
30
+
31
+ | | 2-class | 3-class |
32
+ |------------------|--------------|--------------|-----------------|-----------------|------------------|
33
+ | | *Simple* | *Complex* | *Basic* | *Intermediate* | *Advanced* |
34
+ | With CERF Levels | A1, A2, B1 | B2, C1, C2 | A1, A2 | B1,B2 | C1,C2 |
35
+ | Newsela Corpus | Versions 3-4 | Versions 0-1 | Grade Level 2-5 | Grade Level 6-8 | Grade Level 9-12 |
36
+
37
+
38
+ In addition, texts in the dataset could be too long to fit in a model. As such, we created two versions of the dataset, dividing each text into [sentences](https://huggingface.co/datasets/hackathon-pln-es/readability-es-sentences) and [paragraphs](https://huggingface.co/datasets/hackathon-pln-es/readability-es-paragraphs).
39
+
40
+ We also scraped several texts from the ["Corpus de Aprendices del Español" (CAES)](http://galvan.usc.es/caes/). However, due to the time constraints, we leave experiments with it for future work. The data is available [here](https://huggingface.co/datasets/hackathon-pln-es/readability-es-caes).
41
+
42
+ ### Models
43
+
44
+ Our models are based on [BERTIN](https://huggingface.co/bertin-project). We fine-tuned [bertin-roberta-base-spanish](https://huggingface.co/bertin-project/bertin-roberta-base-spanish) in the different versions of our collected dataset. The following models are available:
45
+
46
+ - [2-class sentence-level ](https://huggingface.co/hackathon-pln-es/readability-es-sentences)
47
+ - [2-class paragraph-level ](https://huggingface.co/hackathon-pln-es/readability-es-paragraphs)
48
+ - [3-class sentence-level](https://huggingface.co/hackathon-pln-es/readability-es-3class-sentences)
49
+ - [3-class paragraph-level](https://huggingface.co/hackathon-pln-es/readability-es-3class-paragraphs)
50
+
51
+ More details about how we trained these models can be found in our [report](https://wandb.ai/readability-es/readability-es/reports/Texts-Readability-Analysis-for-Spanish--VmlldzoxNzU2MDUx).
52
 
53
  ### Team
54
 
55
  - [Laura Vásquez-Rodríguez](https://lmvasque.github.io/)
56
+ - [Pedro Cuenca](https://twitter.com/pcuenq/)
57
+ - [Sergio Morales](https://www.fireblend.com/)
58
  - [Fernando Alva-Manchego](https://feralvam.github.io/)
59
 
60
  """