DCU-NLP commited on
Commit
a7fbe12
1 Parent(s): 97e1699

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +43 -0
README.md ADDED
@@ -0,0 +1,43 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - ga
4
+ license: apache-2.0
5
+ tags:
6
+ - irish
7
+ - electra
8
+ widget:
9
+ - text: "Ceoltóir [MASK] ab ea Johnny Cash."
10
+ ---
11
+
12
+ # gaELECTRA
13
+ [gaELECTRA](https://arxiv.org/abs/2107.12930) is an ELECTRA model trained on 7.9M Irish sentences. For more details, including the hyperparameters and pretraining corpora used please refer to our paper. For fine-tuning this model on a token classification task, e.g. Named Entity Recognition, use the discriminator model.
14
+
15
+ ### Limitations and bias
16
+ Some data used to pretrain gaBERT was scraped from the web which potentially contains ethically problematic text (bias, hate, adult content, etc.). Consequently, downstream tasks/applications using gaBERT should be thoroughly tested with respect to ethical considerations.
17
+
18
+
19
+ ### BibTeX entry and citation info
20
+ If you use this model in your research, please consider citing our paper:
21
+
22
+ ```
23
+ @article{DBLP:journals/corr/abs-2107-12930,
24
+ author = {James Barry and
25
+ Joachim Wagner and
26
+ Lauren Cassidy and
27
+ Alan Cowap and
28
+ Teresa Lynn and
29
+ Abigail Walsh and
30
+ M{\'{\i}}che{\'{a}}l J. {\'{O}} Meachair and
31
+ Jennifer Foster},
32
+ title = {gaBERT - an Irish Language Model},
33
+ journal = {CoRR},
34
+ volume = {abs/2107.12930},
35
+ year = {2021},
36
+ url = {https://arxiv.org/abs/2107.12930},
37
+ archivePrefix = {arXiv},
38
+ eprint = {2107.12930},
39
+ timestamp = {Fri, 30 Jul 2021 13:03:06 +0200},
40
+ biburl = {https://dblp.org/rec/journals/corr/abs-2107-12930.bib},
41
+ bibsource = {dblp computer science bibliography, https://dblp.org}
42
+ }
43
+ ```