DCU-NLP's picture
Create README.md
a7fbe12
metadata
language:
  - ga
license: apache-2.0
tags:
  - irish
  - electra
widget:
  - text: Ceolt贸ir [MASK] ab ea Johnny Cash.

gaELECTRA

gaELECTRA is an ELECTRA model trained on 7.9M Irish sentences. For more details, including the hyperparameters and pretraining corpora used please refer to our paper. For fine-tuning this model on a token classification task, e.g. Named Entity Recognition, use the discriminator model.

Limitations and bias

Some data used to pretrain gaBERT was scraped from the web which potentially contains ethically problematic text (bias, hate, adult content, etc.). Consequently, downstream tasks/applications using gaBERT should be thoroughly tested with respect to ethical considerations.

BibTeX entry and citation info

If you use this model in your research, please consider citing our paper:

@article{DBLP:journals/corr/abs-2107-12930,
  author    = {James Barry and
               Joachim Wagner and
               Lauren Cassidy and
               Alan Cowap and
               Teresa Lynn and
               Abigail Walsh and
               M{\'{\i}}che{\'{a}}l J. {\'{O}} Meachair and
               Jennifer Foster},
  title     = {gaBERT - an Irish Language Model},
  journal   = {CoRR},
  volume    = {abs/2107.12930},
  year      = {2021},
  url       = {https://arxiv.org/abs/2107.12930},
  archivePrefix = {arXiv},
  eprint    = {2107.12930},
  timestamp = {Fri, 30 Jul 2021 13:03:06 +0200},
  biburl    = {https://dblp.org/rec/journals/corr/abs-2107-12930.bib},
  bibsource = {dblp computer science bibliography, https://dblp.org}
}