File size: 4,037 Bytes
c67af27 f84fc22 c67af27 f84fc22 9c25c22 f84fc22 7a4cd37 f84fc22 d8f51a8 f84fc22 d8f51a8 f84fc22 d8f51a8 f84fc22 d8f51a8 f84fc22 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 |
---
license: apache-2.0
language:
- nl
metrics:
- f1
- exact_match
library_name: transformers
tags:
- dutch
- restaurant
- mt5
---
# Dutch-Restaurant-mT5-Small
The Dutch-Restaurant-mT5-Small model was introduced in [A Simple yet Effective Framework for Few-Shot Aspect-Based Sentiment Analysis (SIGIR'23)](https://doi.org/10.1145/3539618.3591940) by Zengzhi Wang, Qiming Xie, and Rui Xia.
The details are available at [Github:FS-ABSA](https://github.com/nustm/fs-absa) and [SIGIR'23 paper](https://doi.org/10.1145/3539618.3591940).
# Model Description
To bridge the domain (and lingual) gap between general pre-training and the task of interest in a specific domain (i.e., `restaurant` in this repo), we conducted *domain-adaptive pre-training*,
i.e., continuing pre-training the language model (i.e., mT5-small) on the unlabeled corpus of the domain (and lingual) of interest (i.e., `restaurant`) with the *text-infilling objective*
(corruption rate of 15% and average span length of 1). We collect relevant 100k unlabeled reviews from Yelp for the restaurant domain and then translate them into Dutch with the DeepL translator.
For pre-training, we employ the [Adafactor](https://arxiv.org/abs/1804.04235) optimizer with a batch size of 16 and a constant learning rate of 1e-4.
Our model can be seen as an enhanced T5 model in the restaurant domain, which can be used for various NLP tasks related to the restaurant domain,
including but not limited to fine-grained sentiment analysis (ABSA), product-relevant Question Answering (PrQA), text style transfer, etc.
```python
>>> from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
>>> tokenizer = AutoTokenizer.from_pretrained("NUSTM/dutch-restaurant-mt5-small")
>>> model = AutoModelForSeq2SeqLM.from_pretrained("NUSTM/dutch-restaurant-mt5-small")
>>> input_ids = tokenizer(
... "De pizza's hier zijn heerlijk!!!", return_tensors="pt"
... ).input_ids # Batch size 1
>>> outputs = model(input_ids=input_ids)
```
# Citation
If you find this work helpful, please cite our paper as follows:
```bibtex
@inproceedings{10.1145/3539618.3591940,
author = {Wang, Zengzhi and Xie, Qiming and Xia, Rui},
title = {A Simple yet Effective Framework for Few-Shot Aspect-Based Sentiment Analysis},
year = {2023},
isbn = {9781450394086},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3539618.3591940},
doi = {10.1145/3539618.3591940},
abstract = {The pre-training and fine-tuning paradigm has become the main-stream framework in the field of Aspect-Based Sentiment Analysis (ABSA). Although it has achieved sound performance in the domains containing enough fine-grained aspect-sentiment annotations, it is still challenging to conduct few-shot ABSA in domains where manual annotations are scarce. In this work, we argue that two kinds of gaps, i.e., domain gap and objective gap, hinder the transfer of knowledge from pre-training language models (PLMs) to ABSA tasks. To address this issue, we introduce a simple yet effective framework called FS-ABSA, which involves domain-adaptive pre-training and text-infilling fine-tuning. We approach the End-to-End ABSA task as a text-infilling problem and perform domain-adaptive pre-training with the text-infilling objective, narrowing the two gaps and consequently facilitating the knowledge transfer. Experiments show that the resulting model achieves more compelling performance than baselines under the few-shot setting while driving the state-of-the-art performance to a new level across datasets under the fully-supervised setting. Moreover, we apply our framework to two non-English low-resource languages to demonstrate its generality and effectiveness.},
booktitle = {Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval},
pages = {1765–1770},
numpages = {6},
keywords = {few-shot learning, opinion mining, sentiment analysis},
location = {Taipei, Taiwan},
series = {SIGIR '23}
}
```
|