lecslab/glosslm
Updated
•
53
Multilingual IGT corpora and pretrained models
Note The base GlossLM model, pretrained on 450k examples (the train split) and nearly 2k languages
Note Base GlossLM, with glosses normalized following the UniMorph schema Excludes segmented examples for the evaluation languages
Note The full pretraining corpus with 450k examples and nearly 2k languages
Note The pretraining corpus, split into train/dev/test splits for experiments
Note The pretraining corpus, split and with glosses normalized following the UniMorph schema
Note The GlossLM model, excluding segmented examples for the evaluation languages