Update README.md
Browse files
README.md
CHANGED
@@ -56,7 +56,7 @@ This model was trained on [COALAS](https://github.com/lirondos/coalas/), a corpu
|
|
56 |
|**Total** |372,701 |3,038 |123 |1,683 |
|
57 |
|
58 |
## More info
|
59 |
-
More information about the dataset, model experimentation and error analysis can be found in the paper: *[Detecting Unassimilated Borrowings in Spanish: An Annotated Corpus and Approaches to Modeling](https://
|
60 |
|
61 |
## How to use
|
62 |
|
@@ -91,16 +91,18 @@ for entity in sentence.get_spans():
|
|
91 |
## Citation
|
92 |
If you use this model, please cite the following reference:
|
93 |
```
|
94 |
-
|
95 |
-
|
96 |
-
|
97 |
-
|
98 |
-
|
99 |
-
|
100 |
-
|
101 |
-
|
102 |
-
|
103 |
-
|
104 |
-
|
|
|
|
|
105 |
```
|
106 |
|
|
|
56 |
|**Total** |372,701 |3,038 |123 |1,683 |
|
57 |
|
58 |
## More info
|
59 |
+
More information about the dataset, model experimentation and error analysis can be found in the paper: *[Detecting Unassimilated Borrowings in Spanish: An Annotated Corpus and Approaches to Modeling](https://aclanthology.org/2022.acl-long.268/)*.
|
60 |
|
61 |
## How to use
|
62 |
|
|
|
91 |
## Citation
|
92 |
If you use this model, please cite the following reference:
|
93 |
```
|
94 |
+
@inproceedings{alvarez-mellado-lignos-2022-detecting,
|
95 |
+
title = "Detecting Unassimilated Borrowings in {S}panish: {A}n Annotated Corpus and Approaches to Modeling",
|
96 |
+
author = "{\'A}lvarez-Mellado, Elena and
|
97 |
+
Lignos, Constantine",
|
98 |
+
booktitle = "Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
|
99 |
+
month = may,
|
100 |
+
year = "2022",
|
101 |
+
address = "Dublin, Ireland",
|
102 |
+
publisher = "Association for Computational Linguistics",
|
103 |
+
url = "https://aclanthology.org/2022.acl-long.268",
|
104 |
+
pages = "3868--3888",
|
105 |
+
abstract = "This work presents a new resource for borrowing identification and analyzes the performance and errors of several models on this task. We introduce a new annotated corpus of Spanish newswire rich in unassimilated lexical borrowings{---}words from one language that are introduced into another without orthographic adaptation{---}and use it to evaluate how several sequence labeling models (CRF, BiLSTM-CRF, and Transformer-based models) perform. The corpus contains 370,000 tokens and is larger, more borrowing-dense, OOV-rich, and topic-varied than previous corpora available for this task. Our results show that a BiLSTM-CRF model fed with subword embeddings along with either Transformer-based embeddings pretrained on codeswitched data or a combination of contextualized word embeddings outperforms results obtained by a multilingual BERT-based model.",
|
106 |
+
}
|
107 |
```
|
108 |
|