Spaces:

somosnlp-hackathon-2022
/

readability-assessment-spanish

Build error

feralvam commited on Apr 3, 2022

Commit

f7db750

•

1 Parent(s): e2f76e0

Update app.py

Files changed (1) hide show

app.py CHANGED Viewed

@@ -28,13 +28,12 @@ We curated a new dataset that combines existing corpora for readability assessme
 Each text has two readability labels, according to the following mapping:
-|                  |            2-class          |                       3-class                        |
 |------------------|--------------|--------------|-----------------|-----------------|------------------|
-|                  |   *Simple*   |    *Complex* |     *Basic*     |  *Intermediate* |    *Advanced*    |
 | With CERF Levels | A1, A2, B1   | B2, C1, C2   | A1, A2          | B1,B2           | C1,C2            |
 | Newsela Corpus   | Versions 3-4 | Versions 0-1 | Grade Level 2-5 | Grade Level 6-8 | Grade Level 9-12 |
 In addition, texts in the dataset could be too long to fit in a model. As such, we created two versions of the dataset, dividing each text into [sentences](https://huggingface.co/datasets/hackathon-pln-es/readability-es-sentences) and [paragraphs](https://huggingface.co/datasets/hackathon-pln-es/readability-es-paragraphs).
 We also scraped several texts from the ["Corpus de Aprendices del Español" (CAES)](http://galvan.usc.es/caes/). However, due to the time constraints, we leave experiments with it for future work. The data is available [here](https://huggingface.co/datasets/hackathon-pln-es/readability-es-caes).

 Each text has two readability labels, according to the following mapping:
+|                  | 2-class      |              |                 | 3-class         |                  |
 |------------------|--------------|--------------|-----------------|-----------------|------------------|
+|                  | Simple       | Complex      | Basic           | Intermediate    | Advanced         |
 | With CERF Levels | A1, A2, B1   | B2, C1, C2   | A1, A2          | B1,B2           | C1,C2            |
 | Newsela Corpus   | Versions 3-4 | Versions 0-1 | Grade Level 2-5 | Grade Level 6-8 | Grade Level 9-12 |
 In addition, texts in the dataset could be too long to fit in a model. As such, we created two versions of the dataset, dividing each text into [sentences](https://huggingface.co/datasets/hackathon-pln-es/readability-es-sentences) and [paragraphs](https://huggingface.co/datasets/hackathon-pln-es/readability-es-paragraphs).
 We also scraped several texts from the ["Corpus de Aprendices del Español" (CAES)](http://galvan.usc.es/caes/). However, due to the time constraints, we leave experiments with it for future work. The data is available [here](https://huggingface.co/datasets/hackathon-pln-es/readability-es-caes).