ielabgroup
/

xor-tydi-docTquery-mt5-large

Text2Text Generation

Inference Endpoints

Model card Files Files and versions Community

xor-tydi-docTquery-mt5-large / README.md

ArvinZhuang's picture

Update README.md

d7aa233 over 1 year ago

|

2.85 kB

	---
	license: apache-2.0
	library_name: transformers
	pipeline_tag: text2text-generation

	inference:
	parameters:
	do_sample: true
	max_length: 64
	top_k: 10
	temperature: 1
	num_return_sequences: 10
	widget:
	- text: >-
	Generate a Japanese question for this passage: Transformer (machine learning model) A transformer is a deep learning model that adopts the mechanism of self-attention, differentially weighting the significance of each part of the input (which includes the recursive output) data.
	example_title: Generate Japanese questions

	- text: >-
	Generate a Arabic question for this passage: Transformer (machine learning model) A transformer is a deep learning model that adopts the mechanism of self-attention, differentially weighting the significance of each part of the input (which includes the recursive output) data.
	example_title: Generate Arabic questions
	---

	## Model description

	mT5-large query generation model that is trained with XOR QA data.

	Used in paper [Bridging the Gap Between Indexing and Retrieval for
	Differentiable Search Index with Query Generation](https://arxiv.org/pdf/2206.10128.pdf)

	and [Augmenting Passage Representations with Query Generation
	for Enhanced Cross-Lingual Dense Retrieval]()

	### How to use
	```python
	from transformers import pipeline

	lang2mT5 = dict(
	ar='Arabic',
	bn='Bengali',
	fi='Finnish',
	ja='Japanese',
	ko='Korean',
	ru='Russian',
	te='Telugu'
	)
	PROMPT = 'Generate a {lang} question for this passage: {title} {passage}'

	title = 'Transformer (machine learning model)'
	passage = 'A transformer is a deep learning model that adopts the mechanism of self-attention, differentially ' \
	'weighting the significance of each part of the input (which includes the recursive output) data.'


	model_name_or_path = 'ielabgroup/xor-tydi-docTquery-mt5-base'
	input_text = PROMPT.format_map({'lang': lang2mT5['ja'],
	'title': title,
	'passage': passage})

	generator = pipeline(model=model_name_or_path,
	task='text2text-generation',
	device="cuda:0",
	)

	results = generator(input_text,
	do_sample=True,
	max_length=64,
	num_return_sequences=10,
	)

	for i, result in enumerate(results):
	print(f'{i + 1}. {result["generated_text"]}')
	```

	### BibTeX entry and citation info

	```bibtex
	@article{zhuang2022bridging,
	title={Bridging the gap between indexing and retrieval for differentiable search index with query generation},
	author={Zhuang, Shengyao and Ren, Houxing and Shou, Linjun and Pei, Jian and Gong, Ming and Zuccon, Guido and Jiang, Daxin},
	journal={arXiv preprint arXiv:2206.10128},
	year={2022}
	}
	```