pipesanma
/

chasquilla-question-generator

Text2Text Generation

Inference Endpoints

text-generation-inference

Model card Files Files and versions Community

chasquilla-question-generator / README.md

pipesanma's picture

update version models

0ebd40f 8 months ago

|

raw history blame contribute delete

No virus

2.43 kB

	---
	license: apache-2.0
	datasets:
	- squad
	language:
	- en
	---


	# Question Generator

	This model should be used to generate questions based on a given string.

	### Out-of-Scope Use

	English language support only.

	## How to Get Started with the Model

	Use the code below to get started with the model.

	```python
	import torch
	from transformers import T5ForConditionalGeneration, T5Tokenizer

	def question_parser(question: str) -> str:
	return " ".join(question.split(":")[1].split())

	def generate_questions_v2(context: str, answer: str, n_questions: int = 1):
	model = T5ForConditionalGeneration.from_pretrained(
	"pipesanma/chasquilla-question-generator"
	)
	tokenizer = T5Tokenizer.from_pretrained("pipesanma/chasquilla-question-generator")

	device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
	model = model.to(device)
	text = "context: " + context + " " + "answer: " + answer + " </s>"

	encoding = tokenizer.encode_plus(
	text, max_length=512, padding=True, return_tensors="pt"
	)
	input_ids, attention_mask = encoding["input_ids"].to(device), encoding[
	"attention_mask"
	].to(device)

	model.eval()
	beam_outputs = model.generate(
	input_ids=input_ids,
	attention_mask=attention_mask,
	max_length=72,
	early_stopping=True,
	num_beams=5,
	num_return_sequences=n_questions,
	)

	questions = []

	for beam_output in beam_outputs:
	sent = tokenizer.decode(
	beam_output, skip_special_tokens=True, clean_up_tokenization_spaces=True
	)
	print(sent)
	questions.append(question_parser(sent))

	return questions


	context = "President Donald Trump said and predicted that some states would reopen this month."
	answer = "Donald Trump"

	questions = generate_questions_v2(context, answer, 1)
	print(questions)
	```

	## Training Details

	### Dataset generation

	The dataset is "squad" from datasets library.

	Check the [utils/dataset_gen.py](utils/dataset_gen.py) file for the dataset generation.

	### Training model

	Check the [utils/t5_train_model.py](utils/t5_train_model.py) file for the training process

	### Model and Tokenizer versions

	(v1.0) Model and Tokenizer V1: trained with 1000 rows


	(v1.1) Model and Tokenizer V2: trained with 3000 rows

	(v1.2) Model and Tokenizer V3: trained with all rows from datasets (78664 rows-train, 9652 rows-validation)