NUSTM
/

dutch-restaurant-mt5-small

Text2Text Generation

Inference Endpoints

Model card Files Files and versions Community

SinclairWang commited on Apr 24, 2023

Commit

f84fc22

·

1 Parent(s): 101d4d9

Update README.md

Files changed (1) hide show

README.md +68 -0

README.md CHANGED Viewed

@@ -1,3 +1,71 @@
 ---
 license: apache-2.0
 ---

 ---
 license: apache-2.0
+language:
+- nl
+metrics:
+- f1
+- exact_match
+library_name: transformers
+tags:
+- dutch
+- restaurant
+- mt5
 ---
+# Dutch-Restaurant-mT5-Small
+The Dutch-Restaurant-mT5-Small model was introduced in [A Simple yet Effective Framework for Few-Shot Aspect-Based Sentiment Analysis (SIGIR'23)](https://doi.org/10.1145/3539618.3591940) by Zengzhi Wang, Qiming Xie, and Rui Xia.
+The details are available at [Github:FS-ABSA](https://github.com/nustm/fs-absa) and [SIGIR'23 paper](https://doi.org/10.1145/3539618.3591940).
+# Model Description
+To bridge the domain gap between general pre-training and the task of interest in a specific domain (i.e., `restaurant` in this repo), we conducted *domain-adaptive pre-training*,
+i.e., continuing pre-training the language model (i.e., mT5-small) on the unlabeled corpus of the domain of interest (i.e., `restaurant`) with the *text-infilling objective*
+(corruption rate of 15% and average span length of 1). We collect relevant 100k unlabeled reviews from Yelp for the restaurant domain then translate them into dutch with the DeepL translator.
+For pre-training, we employ the [Adafactor](https://arxiv.org/abs/1804.04235) optimizer with a batch size of 16 and a constant learning rate of 1e-4.
+Our model can be seen as an enhanced T5 model in the restaurant domain, which can be used for various NLP tasks related to the restaurant domain,
+including but not limited to fine-grained sentiment analysis (ABSA), product-relevant Question Answering (PrQA), text style transfer, etc.
+```python
+>>> from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
+>>> tokenizer = AutoTokenizer.from_pretrained("NUSTM/dutch-restaurant-mt5-small")
+>>> model = AutoModelForSeq2SeqLM.from_pretrained("NUSTM/dutch-restaurant-mt5-small")
+>>> input_ids = tokenizer(
+...    "The pizza here is delicious!!", return_tensors="pt"
+... ).input_ids  # Batch size 1
+>>> outputs = model(input_ids=input_ids)
+```
+# Citation
+If you find this work helpful, please cite our paper as follows:
+```bibtex
+@inproceedings{wang2023fs-absa,
+author = {Wang, Zengzhi and Xie, Qiming and Xia, Rui},
+title = {A Simple yet Effective Framework for Few-Shot Aspect-Based Sentiment Analysis},
+year = {2023},
+isbn = {9781450394086},
+publisher = {Association for Computing Machinery},
+address = {New York, NY, USA},
+url = {https://doi.org/10.1145/3539618.3591940},
+doi = {10.1145/3539618.3591940},
+booktitle = {Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval},
+numpages = {6},
+location = {Taipei, Taiwan},
+series = {SIGIR '23}
+}
+```
+Note that the complete citation format will be announced once our paper is published in the SIGIR 2023 conference proceedings.