lirondos commited on
Commit
8f8c543
1 Parent(s): 74b687c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +14 -12
README.md CHANGED
@@ -56,7 +56,7 @@ This model was trained on [COALAS](https://github.com/lirondos/coalas/), a corpu
56
  |**Total** |372,701 |3,038 |123 |1,683 |
57
 
58
  ## More info
59
- More information about the dataset, model experimentation and error analysis can be found in the paper: *[Detecting Unassimilated Borrowings in Spanish: An Annotated Corpus and Approaches to Modeling](https://arxiv.org/abs/2203.16169)*.
60
 
61
  ## How to use
62
 
@@ -91,16 +91,18 @@ for entity in sentence.get_spans():
91
  ## Citation
92
  If you use this model, please cite the following reference:
93
  ```
94
- @misc{https://doi.org/10.48550/arxiv.2203.16169,
95
- doi = {10.48550/ARXIV.2203.16169},
96
- url = {https://arxiv.org/abs/2203.16169},
97
- author = {Álvarez-Mellado, Elena and Lignos, Constantine},
98
- title = {Detecting Unassimilated Borrowings in Spanish: An Annotated Corpus and Approaches to Modeling},
99
- publisher = {arXiv},
100
- year = {2022}
101
- archivePrefix = {arXiv},
102
- eprint = {2203.16169},
103
- primaryClass = {cs.CL},
104
- }
 
 
105
  ```
106
 
 
56
  |**Total** |372,701 |3,038 |123 |1,683 |
57
 
58
  ## More info
59
+ More information about the dataset, model experimentation and error analysis can be found in the paper: *[Detecting Unassimilated Borrowings in Spanish: An Annotated Corpus and Approaches to Modeling](https://aclanthology.org/2022.acl-long.268/)*.
60
 
61
  ## How to use
62
 
 
91
  ## Citation
92
  If you use this model, please cite the following reference:
93
  ```
94
+ @inproceedings{alvarez-mellado-lignos-2022-detecting,
95
+ title = "Detecting Unassimilated Borrowings in {S}panish: {A}n Annotated Corpus and Approaches to Modeling",
96
+ author = "{\'A}lvarez-Mellado, Elena and
97
+ Lignos, Constantine",
98
+ booktitle = "Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
99
+ month = may,
100
+ year = "2022",
101
+ address = "Dublin, Ireland",
102
+ publisher = "Association for Computational Linguistics",
103
+ url = "https://aclanthology.org/2022.acl-long.268",
104
+ pages = "3868--3888",
105
+ abstract = "This work presents a new resource for borrowing identification and analyzes the performance and errors of several models on this task. We introduce a new annotated corpus of Spanish newswire rich in unassimilated lexical borrowings{---}words from one language that are introduced into another without orthographic adaptation{---}and use it to evaluate how several sequence labeling models (CRF, BiLSTM-CRF, and Transformer-based models) perform. The corpus contains 370,000 tokens and is larger, more borrowing-dense, OOV-rich, and topic-varied than previous corpora available for this task. Our results show that a BiLSTM-CRF model fed with subword embeddings along with either Transformer-based embeddings pretrained on codeswitched data or a combination of contextualized word embeddings outperforms results obtained by a multilingual BERT-based model.",
106
+ }
107
  ```
108