A Transformer-based Persian Language Model Further Pretrained on Persian Poetry

ALBERT was first introduced by Hooshvare with 30,000 vocabulary size as lite BERT for self-supervised learning of language representations for the Persian language. Here we wanted to utilize its capabilities by pretraining it on a large corpse of Persian poetry. This model has been post-trained on 80 percent of poetry verses of the Persian poetry dataset - Ganjoor- and has been evaluated on the other 20 percent.

Downloads last month
Hosted inference API
Mask token: [MASK]
This model can be loaded on the Inference API on-demand.