JiachengLi
commited on
Commit
•
5cc265b
1
Parent(s):
0c08503
Update README.md
Browse files
README.md
CHANGED
@@ -1,3 +1,325 @@
|
|
1 |
-
---
|
2 |
-
license: mit
|
3 |
-
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: mit
|
3 |
+
---
|
4 |
+
# UCTopic
|
5 |
+
|
6 |
+
This repository contains the code of model UCTopic and an easy-to-use tool UCTopicTool used for <strong>Topic Mining</strong>, <strong>Unsupervised Aspect Extractioin</strong> or <strong>Phrase Retrieval</strong>.
|
7 |
+
|
8 |
+
Our ACL 2022 paper [UCTopic: Unsupervised Contrastive Learning for Phrase Representations and Topic Mining](https://arxiv.org/abs/2202.13469).
|
9 |
+
|
10 |
+
# Quick Links
|
11 |
+
|
12 |
+
- [Overview](#overview)
|
13 |
+
- [Pretrained Model](#pretrained-model)
|
14 |
+
- [Getting Started](#getting-started)
|
15 |
+
- [UCTopic Model](#uctopic-model)
|
16 |
+
- [UCTopicTool](#uctopictool)
|
17 |
+
- [Experiments in Paper](#experiments)
|
18 |
+
- [Requirements](#requirements)
|
19 |
+
- [Datasets](#datasets)
|
20 |
+
- [Entity Clustering](#entity-clustering)
|
21 |
+
- [Topic Mining](#topic-mining)
|
22 |
+
- [Pretraining](#pretraining)
|
23 |
+
- [Contact](#contact)
|
24 |
+
- [Citation](#citation)
|
25 |
+
|
26 |
+
# Overview
|
27 |
+
|
28 |
+
We propose UCTopic, a novel unsupervised contrastive learning framework for context-aware phrase representations and topic mining. UCTopic is pretrained in a large scale to distinguish if the contexts of two phrase mentions have the same semantics. The key to pretraining is positive pair construction from our phrase-oriented assumptions. However, we find traditional in-batch negatives cause performance decay when finetuning on a dataset with small topic numbers. Hence, we propose cluster-assisted contrastive learning(CCL) which largely reduces noisy negatives by selecting negatives from clusters and further improves phrase representations for topics accordingly.
|
29 |
+
|
30 |
+
# Pretrained Model
|
31 |
+
Our released model:
|
32 |
+
| Model | Note|
|
33 |
+
|:-------------------------------|------|
|
34 |
+
|[uctopic-base](https://drive.google.com/file/d/1XQzi4E9ctdI373CK5O-pXQyBvOONssp1/view?usp=sharing)| Pretrained UCTopic model based on [LUKE-BASE](https://arxiv.org/abs/2010.01057)
|
35 |
+
|
36 |
+
Unzip to get `uctopic-base` folder.
|
37 |
+
|
38 |
+
# Getting Started
|
39 |
+
We provide an easy-to-use phrase representation tool based on our UCTopic model. To use the tool, first install the uctopic package from PyPI
|
40 |
+
```bash
|
41 |
+
pip install uctopic
|
42 |
+
```
|
43 |
+
Or directly install it from our code
|
44 |
+
```bash
|
45 |
+
python setup.py install
|
46 |
+
```
|
47 |
+
|
48 |
+
## UCTopic Model
|
49 |
+
After installing the package, you can load our model by just two lines of code
|
50 |
+
```python
|
51 |
+
from uctopic import UCTopic
|
52 |
+
model = UCTopic.from_pretrained('JiachengLi/uctopic-base')
|
53 |
+
```
|
54 |
+
The model will automatically download pre-trained parameters from [HuggingFace's models](https://huggingface.co/models). If you encounter any problem when directly loading the models by HuggingFace's API, you can also download the models manually from the above table and use `model = UCTopic.from_pretrained({PATH TO THE DOWNLOAD MODEL})`.
|
55 |
+
|
56 |
+
To get pre-trained <strong>phrase representations</strong>, our model inputs are same as [LUKE](https://huggingface.co/docs/transformers/model_doc/luke). Note: please input only <strong>ONE</strong> span each time, otherwise, will have performance decay according to our empirical results.
|
57 |
+
|
58 |
+
```python
|
59 |
+
from uctopic import UCTopicTokenizer, UCTopic
|
60 |
+
|
61 |
+
tokenizer = UCTopicTokenizer.from_pretrained('JiachengLi/uctopic-base')
|
62 |
+
model = UCTopic.from_pretrained('JiachengLi/uctopic-base')
|
63 |
+
|
64 |
+
text = "Beyoncé lives in Los Angeles."
|
65 |
+
entity_spans = [(17, 28)] # character-based entity span corresponding to "Los Angeles"
|
66 |
+
|
67 |
+
inputs = tokenizer(text, entity_spans=entity_spans, add_prefix_space=True, return_tensors="pt")
|
68 |
+
outputs, phrase_repr = model(**inputs)
|
69 |
+
```
|
70 |
+
`phrase_repr` is the phrase embedding (size `[768]`) of the phrase `Los Angeles`. `outputs` has the same format as the outputs from `LUKE`.
|
71 |
+
|
72 |
+
## UCTopicTool
|
73 |
+
We provide a tool `UCTopicTool` built on `UCTopic` for efficient phrase encoding, topic mining (or unsupervised aspect extraction) or phrase retrieval.
|
74 |
+
|
75 |
+
### Initialization
|
76 |
+
|
77 |
+
`UCTopicTool` is initialized by giving the `model_name_or_path` and `device`.
|
78 |
+
```python
|
79 |
+
from uctopic import UCTopicTool
|
80 |
+
|
81 |
+
topic_tool = UCTopicTool('JiachengLi/uctopic-base', device='cuda:0')
|
82 |
+
```
|
83 |
+
|
84 |
+
### Phrase Encoding
|
85 |
+
|
86 |
+
Phrases are encoded by our method `UCTopicTool.encode` in batches, which is more efficient than `UCTopic`.
|
87 |
+
```python
|
88 |
+
phrases = [["This place is so much bigger than others!", (0, 10)],
|
89 |
+
["It was totally packed and loud.", (15, 21)],
|
90 |
+
["Service was on the slower side.", (0, 7)],
|
91 |
+
["I ordered 2 mojitos: 1 lime and 1 mango.", (12, 19)],
|
92 |
+
["The ingredient weren't really fresh.", (4, 14)]]
|
93 |
+
|
94 |
+
embeddings = topic_tool.encode(phrases) # len(embeddings) is equal to len(phrases)
|
95 |
+
```
|
96 |
+
**Note**: Each instance in `phrases` contains only one sentence and one span (character-level position) in format `[sentence, span]`.
|
97 |
+
|
98 |
+
Arguments for `UCTopicTool.encode` are as follows,
|
99 |
+
* **phrase** (List) - A list of `[sentence, span]` to be encoded.
|
100 |
+
* **return_numpy** (bool, *optional*, defaults to `False`) - Return `numpy.array` or `torch.Tensor`.
|
101 |
+
* **normalize_to_unit** (bool, *optional*, defaults to `True`) - Normalize all embeddings to unit vectors.
|
102 |
+
* **keepdim** (bool, *optional*, defaults to `True`) - Keep dimension size `[instance_number, hidden_size]`.
|
103 |
+
* **batch_size** (int, *optional*, defaults to `64`) - The size of mini-batch in the model.
|
104 |
+
|
105 |
+
### Topic Mining and Unsupervised Aspect Extraction
|
106 |
+
|
107 |
+
The method `UCTopicTool.topic_mining` can mine topical phrases or conduct aspect extraction from sentences with or without spans.
|
108 |
+
|
109 |
+
```python
|
110 |
+
sentences = ["This place is so much bigger than others!",
|
111 |
+
"It was totally packed and loud.",
|
112 |
+
"Service was on the slower side.",
|
113 |
+
"I ordered 2 mojitos: 1 lime and 1 mango.",
|
114 |
+
"The ingredient weren't really fresh."]
|
115 |
+
|
116 |
+
spans = [[(0, 10)], # This place
|
117 |
+
[(15, 21), (26, 30)], # packed; loud
|
118 |
+
[(0, 7)], # Service
|
119 |
+
[(12, 19), (21, 27), (32, 39)], # mojitos; 1 lime; 1 mango
|
120 |
+
[(4, 14)]] # ingredient
|
121 |
+
# len(sentences) is equal to len(spans)
|
122 |
+
output_data, topic_phrase_dict = tool.topic_mining(sentences, spans, \
|
123 |
+
n_clusters=[15, 25])
|
124 |
+
|
125 |
+
# predict topic for new phrases
|
126 |
+
phrases = [["The food here is amazing!", (4, 8)],
|
127 |
+
["Lovely ambiance with live music!", (21, 31)]]
|
128 |
+
|
129 |
+
topics = tool.predict_topic(phrases)
|
130 |
+
```
|
131 |
+
**Note**: If `spans` is not given, `UCTopicTool` will extract noun phrases by [spaCy](https://spacy.io/).
|
132 |
+
|
133 |
+
Arguments for `UCTopicTool.topic_mining` are as follows,
|
134 |
+
|
135 |
+
Data arguments:
|
136 |
+
* **sentences** (List) - A List of sentences for topic mining.
|
137 |
+
* **spans** (List, *optional*, defaults to `None`) - A list of span list corresponding sentences, e.g., `[[(0, 9), (5, 7)], [(1, 2)]]` and `len(sentences)==len(spans)`. If None, automatically mine phrases from noun chunks.
|
138 |
+
|
139 |
+
Clustering arguments:
|
140 |
+
* **n_clusters** (int or List, *optional*, defaults to `2`) - The number of topics. When `n_clusters` is a list, `n_clusters[0]` and `n_clusters[1]` will be the minimum and maximum numbers to search, `n_clusters[2]` is the search step length (if not provided, default to 1).
|
141 |
+
* **meric** (str, *optional*, defaults to `"cosine"`) - The metric to measure the distance between vectors. `"cosine"` or `"euclidean"`.
|
142 |
+
* **batch_size** (int, *optional*, defaults to `64`) - The size of mini-batch for phrase encoding.
|
143 |
+
* **max_iter** (int, *optional*, defaults to `300`) - The maximum iteration number of kmeans.
|
144 |
+
|
145 |
+
CCL-finetune arguments:
|
146 |
+
* **ccl_finetune** (bool, *optional*, defaults to `True`) - Whether to conduct CCL-finetuning in the paper.
|
147 |
+
* **batch_size_finetune** (int, *optional*, defaults to `8`) - The size of mini-batch for finetuning.
|
148 |
+
* **max_finetune_num** (int, *optional*, defaults to `100000`) - The maximum number of training instances for finetuning.
|
149 |
+
* **finetune_step** (int, *optional*, defaults to `2000`) - The number of training steps for finetuning.
|
150 |
+
* **contrastive_num** (int, *optional*, defaults to `5`) - The number of negatives in contrastive learning.
|
151 |
+
* **positive_ratio** (float, *optional*, defaults to `0.1`) - The ratio of the most confident instances for finetuning.
|
152 |
+
* **n_sampling** (int, *optional*, defaults to `10000`) - The number of sampled examples for cluster number confirmation and finetuning. Set to `-1` to use the whole dataset.
|
153 |
+
* **n_workers** (int, *optional*, defaults to `8`) - The number of workers for preprocessing data.
|
154 |
+
|
155 |
+
Returns for `UCTopicTool.topic_mining` are as follows,
|
156 |
+
* **output_data** (List) - A list of sentences and corresponding phrases and topic numbers. Each element is `[sentence, [[start1, end1, topic1], [start2, end2, topic2]]]`.
|
157 |
+
* **topic_phrase_dict** (Dict) - A dictionary of topics and the list of phrases under a topic. The phrases are sorted by their confidence scores. E.g., `{topic: [[phrase1, score1], [phrase2, score2]]}`.
|
158 |
+
|
159 |
+
|
160 |
+
The method `UCTopicTool.predict_topic` predicts the topic ids for new phrases based on your training results from `UCTopicTool.topic_mining`. The inputs of `UCTopicTool.predict_topic` are same as `UCTopicTool.encode` and returns a list of topic ids (int).
|
161 |
+
|
162 |
+
|
163 |
+
### Phrase Similarities and Retrieval
|
164 |
+
|
165 |
+
The method `UCTopicTool.similarity` compute the cosine similarities between two groups of phrases:
|
166 |
+
|
167 |
+
```python
|
168 |
+
phrases_a = [["This place is so much bigger than others!", (0, 10)],
|
169 |
+
["It was totally packed and loud.", (15, 21)]]
|
170 |
+
|
171 |
+
phrases_b = [["Service was on the slower side.", (0, 7)],
|
172 |
+
["I ordered 2 mojitos: 1 lime and 1 mango.", (12, 19)],
|
173 |
+
["The ingredient weren't really fresh.", (4, 14)]]
|
174 |
+
|
175 |
+
similarities = tool.similarity(phrases_a, phrases_b)
|
176 |
+
```
|
177 |
+
Arguments for `UCTopicTool.similarity` are as follows,
|
178 |
+
* **queries** (List) - A list of `[sentence, span]` as queries.
|
179 |
+
* **keys** (List or `numpy.array`) - A list of `[sentence, span]` as keys or phrase representations (`numpy.array`) from `UCTopicTool.encode`.
|
180 |
+
* **batch_size** (int, *optional*, defaults to `64`) - The size of mini-batch in the model.
|
181 |
+
|
182 |
+
`UCTopicTool.similarity` returns a `numpy.array` contains the similarities between phrase pairs in two groups.
|
183 |
+
|
184 |
+
|
185 |
+
The methods `UCTopicTool.build_index` and `UCTopicTool.search` are used for phrase retrieval:
|
186 |
+
```python
|
187 |
+
phrases = [["This place is so much bigger than others!", (0, 10)],
|
188 |
+
["It was totally packed and loud.", (15, 21)],
|
189 |
+
["Service was on the slower side.", (0, 7)],
|
190 |
+
["I ordered 2 mojitos: 1 lime and 1 mango.", (12, 19)],
|
191 |
+
["The ingredient weren't really fresh.", (4, 14)]]
|
192 |
+
|
193 |
+
# query multiple phrases
|
194 |
+
query1 = [["The food here is amazing!", (4, 8)],
|
195 |
+
["Lovely ambiance with live music!", (21, 31)]]
|
196 |
+
|
197 |
+
# query single phrases
|
198 |
+
query2 = ["The food here is amazing!", (4, 8)]
|
199 |
+
|
200 |
+
tool.build_index(phrases)
|
201 |
+
results = tool.search(query1, top_k=3)
|
202 |
+
# or
|
203 |
+
results = tool.search(query2, top_k=3)
|
204 |
+
```
|
205 |
+
We also support [faiss](https://github.com/facebookresearch/faiss), an efficient similarity search library. Just install the package following [instructions](https://github.com/facebookresearch/faiss/blob/main/INSTALL.md) here and `UCTopicTool` will automatically use `faiss` for efficient search.
|
206 |
+
|
207 |
+
`UCTopicTool.search` returns the ranked top k phrases for each query.
|
208 |
+
|
209 |
+
|
210 |
+
### Save and Load finetuned UCTopicTool
|
211 |
+
|
212 |
+
The methods `UCTopicTool.save` and `UCTopicTool.load` are used for save and load all paramters of `UCTopicTool`.
|
213 |
+
|
214 |
+
Save:
|
215 |
+
```python
|
216 |
+
tool = UCTopicTool('JiachengLi/uctopic-base', 'cuda:0')
|
217 |
+
# finetune UCTopic with CCL
|
218 |
+
output_data, topic_phrase_dict = tool.topic_mining(sentences, spans, \
|
219 |
+
n_clusters=[15, 25])
|
220 |
+
|
221 |
+
tool.save(**your directory**)
|
222 |
+
```
|
223 |
+
|
224 |
+
Load:
|
225 |
+
```python
|
226 |
+
tool = UCTopicTool('JiachengLi/uctopic-base', 'cuda:0')
|
227 |
+
tool.load(**your directory**)
|
228 |
+
```
|
229 |
+
The loaded parameters will be used for all methods (for encoding, topic mining, phrase similarities and retrieval) introduced above.
|
230 |
+
|
231 |
+
# Experiments
|
232 |
+
In this section, we re-implement experiments in our paper.
|
233 |
+
|
234 |
+
## Requirements
|
235 |
+
First, install PyTorch by following the instructions from [the official website](https://pytorch.org). To faithfully reproduce our results, please use the correct `1.9.0` version corresponding to your platforms/CUDA versions.
|
236 |
+
|
237 |
+
Then run the following script to install the remaining dependencies,
|
238 |
+
```bash
|
239 |
+
pip install -r requirements.txt
|
240 |
+
```
|
241 |
+
|
242 |
+
Download `en_core_web_sm` model from spacy,
|
243 |
+
```bash
|
244 |
+
python -m spacy download en_core_web_sm
|
245 |
+
```
|
246 |
+
|
247 |
+
## Datasets
|
248 |
+
The downstream datasets used in our experiments can be downloaded from [here](https://drive.google.com/file/d/1dVIp9li1Wdh0JgU8slsWm0ObcitbQtSL/view?usp=sharing).
|
249 |
+
|
250 |
+
## Entity Clustering
|
251 |
+
The config file of entity clustering is `clustering/consts.py` and most arguments are self-explained. Please setup `--gpu` and `--data_path` before running. The clustering scores will be printed.
|
252 |
+
|
253 |
+
Clustering with our pre-trained phrase embeddings.
|
254 |
+
```bash
|
255 |
+
python clustering.py --gpu 0
|
256 |
+
```
|
257 |
+
Clustering with our pre-trained phrase embeddings and Cluster-Assisted Constrastive Learning (CCL) proposed in our paper.
|
258 |
+
```bash
|
259 |
+
python clustering_ccl_finetune.py --gpu 0
|
260 |
+
```
|
261 |
+
|
262 |
+
## Topic Mining
|
263 |
+
The config file of entity clustering is `topic_modeling/consts.py`.
|
264 |
+
|
265 |
+
**Key Argument Table**
|
266 |
+
| Arguments | Description |
|
267 |
+
|:-----------------|:-----------:|
|
268 |
+
| --num_classes |**Min** and **Max** number of classes, e.g., `[5, 15]`. Our model will find the class number by [silhouette_score](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.silhouette_score.html).|
|
269 |
+
| --sample_num_cluster |Number of sampled phrases to confirm class number.|
|
270 |
+
| --sample_num_finetune|Number of sampled phrases for CCL finetuning.|
|
271 |
+
| --contrastive_num|Number of negative classes for CCL finetuning.|
|
272 |
+
| --finetune_step | CCL finetuning steps (maximum global steps for finetuning).|
|
273 |
+
|
274 |
+
**Tips**: Please tune `--batch_size` or `--contrastive_num` for suitable GPU memory usage.
|
275 |
+
|
276 |
+
Topic mining with our pre-trained phrase embeddings and Cluster-Assisted Constrastive Learning (CCL) proposed in our paper.
|
277 |
+
```bash
|
278 |
+
python find_topic.py --gpu 0
|
279 |
+
```
|
280 |
+
**Outputs**
|
281 |
+
|
282 |
+
We output three files under `topic_results`:
|
283 |
+
| File Name | Description |
|
284 |
+
|:-----------------|:-----------:|
|
285 |
+
| `merged_phraes_pred_prob.pickle` |A dictionary of phrases and their topic number and prediction probability. A topic of a phrase is merged from all phrase mentioins. `{phrase: [topic_id, probability]}`, e.g., {'fair prices': [0, 0.34889686]}|
|
286 |
+
| `phrase_instances_pred.json`| A list of all mined phrase mentions. Each element is `[[doc_id, start, end, phrase_mention], topic_id]`.|
|
287 |
+
| `topics_phrases.json`|A dictionary of topics and corresponding phrases sorted by probability. `{'topic_id': [[phrase1, prob1], [phrase2, prob2]]}`|
|
288 |
+
|
289 |
+
### Pretraining
|
290 |
+
|
291 |
+
**Data**
|
292 |
+
|
293 |
+
For unsupervised pretraining of UCTopic, we use article and span with links from English Wikipedia and Wikidata. Our processed dataset can be downloaded from [here](https://drive.google.com/file/d/1wflsmhPI9J0ZA6aVRl2mQjHIE6JIvzAv/view?usp=sharing).
|
294 |
+
|
295 |
+
**Training scripts**
|
296 |
+
|
297 |
+
We provide example training scripts and our default training parameters for unsupervised training of UCTopic in `run_example.sh`.
|
298 |
+
|
299 |
+
```bash
|
300 |
+
bash run_example.sh
|
301 |
+
```
|
302 |
+
|
303 |
+
Arguments description can be found in `pretrain.py`. All the other arguments are standard Huggingface's `transformers` training arguments.
|
304 |
+
|
305 |
+
**Convert models**
|
306 |
+
|
307 |
+
Our pretrained checkpoints are slightly different from the checkpoint `uctopic-base`. Please refer `convert_uctopic_parameters.py` to convert it.
|
308 |
+
|
309 |
+
# Contact
|
310 |
+
|
311 |
+
If you have any questions related to the code or the paper, feel free to email Jiacheng (`j9li@eng.ucsd.edu`). If you encounter any problems when using the code, or want to report a bug, you can open an issue. Please try to specify the problem with details so we can help you better and quicker!
|
312 |
+
|
313 |
+
# Citation
|
314 |
+
|
315 |
+
Please cite our paper if you use UCTopic in your work:
|
316 |
+
|
317 |
+
```bibtex
|
318 |
+
@article{Li2022UCTopicUC,
|
319 |
+
title={UCTopic: Unsupervised Contrastive Learning for Phrase Representations and Topic Mining},
|
320 |
+
author={Jiacheng Li and Jingbo Shang and Julian McAuley},
|
321 |
+
journal={ArXiv},
|
322 |
+
year={2022},
|
323 |
+
volume={abs/2202.13469}
|
324 |
+
}
|
325 |
+
```
|