GlossLM - a lecslab Collection

lecslab 's Collections

updated Sep 5, 2024

Multilingual IGT corpora and pretrained models

lecslab/glosslm

Updated Nov 4, 2024 • 137

Note The base GlossLM model, pretrained on 450k examples (the train split) and nearly 2k languages
lecslab/glosslm-unimorph-st_unseg_only

Updated Jun 13, 2024 • 104

Note Base GlossLM, with glosses normalized following the UniMorph schema Excludes segmented examples for the evaluation languages
lecslab/glosslm-corpus

Viewer • Updated Nov 4, 2024 • 451k • 97 • 1

Note The full pretraining corpus with 450k examples and nearly 2k languages
lecslab/glosslm-corpus-split

Viewer • Updated Mar 10, 2024 • 556k • 170

Note The pretraining corpus, split into train/dev/test splits for experiments
lecslab/glosslm-corpus-split-unimorph

Viewer • Updated Jun 8, 2024 • 556k • 40

Note The pretraining corpus, split and with glosses normalized following the UniMorph schema
lecslab/glosslm-st_unseg_only-v2

Updated Feb 7, 2024 • 75

Note The GlossLM model, excluding segmented examples for the evaluation languages