Update: This model has been moved to linhd-postdata/alberti-bert-base-multilingual-cased, where it will be maintained and updated.
ALBERTI is a set of two BERT-based multilingual model for poetry. One for verses and another one for stanzas. This model has been further trained with the PULPO corpus for verses using Flax, including training scripts.
PULPO, the Prodigious Unannotated Literary Poetry Corpus, is a set of multilingual corpora of verses and stanzas with over 95M words.
- Disco v3
- Corpus of Spanish Golden-Age Sonnets
- Corpus general de poesía lírica castellana del Siglo de Oro
- Gongocorpus - source
Also, we obtained the following corpora from these sources:
- Álvaro Pérez (alvp)
- Javier de la Rosa (versae)
- Aitor Díaz (aitordiaz)
- Elena González-Blanco
- Salvador Ros (salva)
- Community Week timeline
- Community Week README
- Community Week thread
- Community Week channel
- Masked Language Modelling example scripts
- Model Repository
This project would not have been possible without the infrastructure and resources provided by HuggingFace and Google Cloud. Moreover, we want to thank POSTDATA Project (ERC-StG-679528) and the Computational Literary Studies Infrastructure (CLS INFRA No. 101004984) of the European Union's Horizon 2020 research and innovation programme for their support and time allowance.
- Downloads last month