File size: 2,790 Bytes

7bc2acd
 
2ceb9d3
 
c41c126
 
 
 
 
 
 
 
 
 
 
42284bc
 
 
7bc2acd
2ceb9d3
 
 
 
 
 
 
 
 
ab0818c
2ceb9d3

---
license: apache-2.0
library_name: transformers
pipeline_tag: text2text-generation

inference:
  parameters:
    do_sample: true
    max_length: 64
    top_k: 10
    temperature: 1
    num_return_sequences: 10
widget:
  - text: >-
      Generate a Japanese question for this passage: Transformer (machine learning model) A transformer is a deep learning model that adopts the mechanism of self-attention, differentially weighting the significance of each part of the input (which includes the recursive output) data.
    
  - text: >-
      Generate a Arabic question for this passage: Transformer (machine learning model) A transformer is a deep learning model that adopts the mechanism of self-attention, differentially weighting the significance of each part of the input (which includes the recursive output) data.
---

## Model description

mT5-base query generation model that is trained with XOR QA data.

Used in paper [Bridging the Gap Between Indexing and Retrieval for
Differentiable Search Index with Query Generation](https://arxiv.org/pdf/2206.10128.pdf)

and [Augmenting Passage Representations with Query Generation
for Enhanced Cross-Lingual Dense Retrieval](https://arxiv.org/pdf/2305.03950.pdf)

### How to use
```python
from transformers import pipeline

lang2mT5 = dict(
    ar='Arabic',
    bn='Bengali',
    fi='Finnish',
    ja='Japanese',
    ko='Korean',
    ru='Russian',
    te='Telugu'
)
PROMPT = 'Generate a {lang} question for this passage: {title} {passage}'

title = 'Transformer (machine learning model)'
passage = 'A transformer is a deep learning model that adopts the mechanism of self-attention, differentially ' \
          'weighting the significance of each part of the input (which includes the recursive output) data.'


model_name_or_path = 'ielabgroup/xor-tydi-docTquery-mt5-base'
input_text = PROMPT.format_map({'lang': lang2mT5['ja'],
                                'title': title,
                                'passage': passage})

generator = pipeline(model=model_name_or_path,
                     task='text2text-generation',
                     device="cuda:0",
                     )

results = generator(input_text,
                    do_sample=True,
                    max_length=64,
                    num_return_sequences=10,
                    )

for i, result in enumerate(results):
    print(f'{i + 1}. {result["generated_text"]}')
```

### BibTeX entry and citation info

```bibtex
@article{zhuang2022bridging,
  title={Bridging the gap between indexing and retrieval for differentiable search index with query generation},
  author={Zhuang, Shengyao and Ren, Houxing and Shou, Linjun and Pei, Jian and Gong, Ming and Zuccon, Guido and Jiang, Daxin},
  journal={arXiv preprint arXiv:2206.10128},
  year={2022}
}
```