benjamin's picture
add pt model, readme
827fbba
metadata
license: mit
language:
  - af
  - az
  - be
  - bg
  - bn
  - ca
  - cs
  - cy
  - da
  - de
  - el
  - en
  - eo
  - es
  - et
  - eu
  - fa
  - fi
  - fr
  - fy
  - ga
  - gl
  - gu
  - he
  - hi
  - hu
  - hy
  - id
  - is
  - it
  - ka
  - kk
  - ky
  - la
  - lt
  - lv
  - mg
  - mk
  - ml
  - mt
  - nl
  - pa
  - pl
  - pt
  - ro
  - ru
  - sk
  - sq
  - sv
  - ta
  - te
  - th
  - tr
  - uk
  - yi
  - yo
datasets:
  - benjamin/compoundpiece

CompoundPiece model trained only on Stage 1 training data (self-supervised training on hyphenated and non-hyphenated words scraped from the web). See CompoundPiece: Evaluating and Improving Decompounding Performance of Language Models.

Citation

@article{minixhofer2023compoundpiece,
  title={CompoundPiece: Evaluating and Improving Decompounding Performance of Language Models},
  author={Minixhofer, Benjamin and Pfeiffer, Jonas and Vuli{\'c}, Ivan},
  journal={arXiv preprint arXiv:2305.14214},
  year={2023}
}

License

MIT