File size: 3,142 Bytes
7bc2acd
 
2ceb9d3
 
c41c126
 
 
 
 
 
 
 
 
 
 
42284bc
 
 
7bc2acd
2ceb9d3
 
 
 
 
 
 
 
 
ab0818c
2ceb9d3
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
605f44c
 
 
 
 
 
 
2ceb9d3
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
---
license: apache-2.0
library_name: transformers
pipeline_tag: text2text-generation

inference:
  parameters:
    do_sample: true
    max_length: 64
    top_k: 10
    temperature: 1
    num_return_sequences: 10
widget:
  - text: >-
      Generate a Japanese question for this passage: Transformer (machine learning model) A transformer is a deep learning model that adopts the mechanism of self-attention, differentially weighting the significance of each part of the input (which includes the recursive output) data.
    
  - text: >-
      Generate a Arabic question for this passage: Transformer (machine learning model) A transformer is a deep learning model that adopts the mechanism of self-attention, differentially weighting the significance of each part of the input (which includes the recursive output) data.
---

## Model description

mT5-base query generation model that is trained with XOR QA data.

Used in paper [Bridging the Gap Between Indexing and Retrieval for
Differentiable Search Index with Query Generation](https://arxiv.org/pdf/2206.10128.pdf)

and [Augmenting Passage Representations with Query Generation
for Enhanced Cross-Lingual Dense Retrieval](https://arxiv.org/pdf/2305.03950.pdf)

### How to use
```python
from transformers import pipeline

lang2mT5 = dict(
    ar='Arabic',
    bn='Bengali',
    fi='Finnish',
    ja='Japanese',
    ko='Korean',
    ru='Russian',
    te='Telugu'
)
PROMPT = 'Generate a {lang} question for this passage: {title} {passage}'

title = 'Transformer (machine learning model)'
passage = 'A transformer is a deep learning model that adopts the mechanism of self-attention, differentially ' \
          'weighting the significance of each part of the input (which includes the recursive output) data.'


model_name_or_path = 'ielabgroup/xor-tydi-docTquery-mt5-base'
input_text = PROMPT.format_map({'lang': lang2mT5['ja'],
                                'title': title,
                                'passage': passage})

generator = pipeline(model=model_name_or_path,
                     task='text2text-generation',
                     device="cuda:0",
                     )

results = generator(input_text,
                    do_sample=True,
                    max_length=64,
                    num_return_sequences=10,
                    )

for i, result in enumerate(results):
    print(f'{i + 1}. {result["generated_text"]}')
```

### BibTeX entry and citation info

```bibtex
@article{zhuang2022bridging,
  title={Bridging the gap between indexing and retrieval for differentiable search index with query generation},
  author={Zhuang, Shengyao and Ren, Houxing and Shou, Linjun and Pei, Jian and Gong, Ming and Zuccon, Guido and Jiang, Daxin},
  journal={arXiv preprint arXiv:2206.10128},
  year={2022}
}

@inproceedings{zhuang2023augmenting,
	title={Augmenting Passage Representations with Query Generation for Enhanced Cross-Lingual Dense Retrieval},
	author={Zhuang, Shengyao and Shou, Linjun and Zuccon, Guido},
	booktitle={Proceedings of the 46th international ACM SIGIR conference on research and development in information retrieval},
	year={2023}
}
```