--- license: mit language: - af - az - be - bg - bn - ca - cs - cy - da - de - el - en - eo - es - et - eu - fa - fi - fr - fy - ga - gl - gu - he - hi - hu - hy - id - is - it - ka - kk - ky - la - lt - lv - mg - mk - ml - mt - nl - pa - pl - pt - ro - ru - sk - sq - sv - ta - te - th - tr - uk - yi - yo datasets: - benjamin/compoundpiece --- CompoundPiece model trained only on Stage 1 training data (self-supervised training on hyphenated and non-hyphenated words scraped from the web). See [CompoundPiece: Evaluating and Improving Decompounding Performance of Language Models](https://arxiv.org/abs/2305.14214). # Citation ``` @article{minixhofer2023compoundpiece, title={CompoundPiece: Evaluating and Improving Decompounding Performance of Language Models}, author={Minixhofer, Benjamin and Pfeiffer, Jonas and Vuli{\'c}, Ivan}, journal={arXiv preprint arXiv:2305.14214}, year={2023} } ``` # License MIT