Commit
•
d7aa233
1
Parent(s):
28d30a6
Update README.md
Browse files
README.md
CHANGED
@@ -1,3 +1,82 @@
|
|
1 |
---
|
2 |
license: apache-2.0
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
3 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
license: apache-2.0
|
3 |
+
library_name: transformers
|
4 |
+
pipeline_tag: text2text-generation
|
5 |
+
|
6 |
+
inference:
|
7 |
+
parameters:
|
8 |
+
do_sample: true
|
9 |
+
max_length: 64
|
10 |
+
top_k: 10
|
11 |
+
temperature: 1
|
12 |
+
num_return_sequences: 10
|
13 |
+
widget:
|
14 |
+
- text: >-
|
15 |
+
Generate a Japanese question for this passage: Transformer (machine learning model) A transformer is a deep learning model that adopts the mechanism of self-attention, differentially weighting the significance of each part of the input (which includes the recursive output) data.
|
16 |
+
example_title: Generate Japanese questions
|
17 |
+
|
18 |
+
- text: >-
|
19 |
+
Generate a Arabic question for this passage: Transformer (machine learning model) A transformer is a deep learning model that adopts the mechanism of self-attention, differentially weighting the significance of each part of the input (which includes the recursive output) data.
|
20 |
+
example_title: Generate Arabic questions
|
21 |
---
|
22 |
+
|
23 |
+
## Model description
|
24 |
+
|
25 |
+
mT5-large query generation model that is trained with XOR QA data.
|
26 |
+
|
27 |
+
Used in paper [Bridging the Gap Between Indexing and Retrieval for
|
28 |
+
Differentiable Search Index with Query Generation](https://arxiv.org/pdf/2206.10128.pdf)
|
29 |
+
|
30 |
+
and [Augmenting Passage Representations with Query Generation
|
31 |
+
for Enhanced Cross-Lingual Dense Retrieval]()
|
32 |
+
|
33 |
+
### How to use
|
34 |
+
```python
|
35 |
+
from transformers import pipeline
|
36 |
+
|
37 |
+
lang2mT5 = dict(
|
38 |
+
ar='Arabic',
|
39 |
+
bn='Bengali',
|
40 |
+
fi='Finnish',
|
41 |
+
ja='Japanese',
|
42 |
+
ko='Korean',
|
43 |
+
ru='Russian',
|
44 |
+
te='Telugu'
|
45 |
+
)
|
46 |
+
PROMPT = 'Generate a {lang} question for this passage: {title} {passage}'
|
47 |
+
|
48 |
+
title = 'Transformer (machine learning model)'
|
49 |
+
passage = 'A transformer is a deep learning model that adopts the mechanism of self-attention, differentially ' \
|
50 |
+
'weighting the significance of each part of the input (which includes the recursive output) data.'
|
51 |
+
|
52 |
+
|
53 |
+
model_name_or_path = 'ielabgroup/xor-tydi-docTquery-mt5-base'
|
54 |
+
input_text = PROMPT.format_map({'lang': lang2mT5['ja'],
|
55 |
+
'title': title,
|
56 |
+
'passage': passage})
|
57 |
+
|
58 |
+
generator = pipeline(model=model_name_or_path,
|
59 |
+
task='text2text-generation',
|
60 |
+
device="cuda:0",
|
61 |
+
)
|
62 |
+
|
63 |
+
results = generator(input_text,
|
64 |
+
do_sample=True,
|
65 |
+
max_length=64,
|
66 |
+
num_return_sequences=10,
|
67 |
+
)
|
68 |
+
|
69 |
+
for i, result in enumerate(results):
|
70 |
+
print(f'{i + 1}. {result["generated_text"]}')
|
71 |
+
```
|
72 |
+
|
73 |
+
### BibTeX entry and citation info
|
74 |
+
|
75 |
+
```bibtex
|
76 |
+
@article{zhuang2022bridging,
|
77 |
+
title={Bridging the gap between indexing and retrieval for differentiable search index with query generation},
|
78 |
+
author={Zhuang, Shengyao and Ren, Houxing and Shou, Linjun and Pei, Jian and Gong, Ming and Zuccon, Guido and Jiang, Daxin},
|
79 |
+
journal={arXiv preprint arXiv:2206.10128},
|
80 |
+
year={2022}
|
81 |
+
}
|
82 |
+
```
|