Update app.py
Browse files
app.py
CHANGED
@@ -49,6 +49,12 @@ Our models are based on [BERTIN](https://huggingface.co/bertin-project). We fine
|
|
49 |
|
50 |
Models showcased in the demo are marked with (*) above. More details about how we trained these models can be found in our [report](https://wandb.ai/readability-es/readability-es/reports/Texts-Readability-Analysis-for-Spanish--VmlldzoxNzU2MDUx).
|
51 |
|
|
|
|
|
|
|
|
|
|
|
|
|
52 |
### Team
|
53 |
|
54 |
- [Laura Vásquez-Rodríguez](https://lmvasque.github.io/)
|
|
|
49 |
|
50 |
Models showcased in the demo are marked with (*) above. More details about how we trained these models can be found in our [report](https://wandb.ai/readability-es/readability-es/reports/Texts-Readability-Analysis-for-Spanish--VmlldzoxNzU2MDUx).
|
51 |
|
52 |
+
## Final Remarks
|
53 |
+
|
54 |
+
- **Data.** One of the main challenges in the area of Automatic Readability Assessment is the availability of reliable data. For Spanish, in particular, the highest-quality existing dataset is Newsela. However, it has a restrictive license that prohibits publicly-sharing its texts. In addition, since its texts are translations from original English news, they can suffer from [translationese](https://en.wiktionary.org/wiki/translationese) deeming them less suitable for training models that will analyse texts produced directly in Spanish. Therefore, our first challenge was to find texts that were originally written in Spanish *and* that contain information about their readability level. Unfortunately, we could not find any other big publicly-available corpus, and decided to combine texts scraped from several webpages. This also prevented us for developing models that could estimate readability in more fine-grained levels (e.g. CEFR levels), which was our original goal. Future work includes contacting editorial groups (similar to Newsela) that create texts for learners of Spanish as a second language, and attempt to establish collaborations that could result in creating new language resources for the readability research community.
|
55 |
+
|
56 |
+
- **Models.**
|
57 |
+
|
58 |
### Team
|
59 |
|
60 |
- [Laura Vásquez-Rodríguez](https://lmvasque.github.io/)
|