Maciel
/

T5Corrector-base-v2

Text2Text Generation

text error correction

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

T5Corrector-base-v2 / README.md

Maciel's picture

Update README.md

0480d70 over 1 year ago

|

2.69 kB

	---
	language:
	- zh
	license: apache-2.0
	tags:
	- t5
	- text error correction
	widget:
	- text: "今天天气不太好，我的心情也不是很偷快"
	example_title: "案例1"
	- text: "能不能帮我买点淇淋，好久没吃了。"
	example_title: "案例2"
	- text: "脑子有点胡涂了，这道题冥冥学过还没有做出来"
	example_title: "案例3"
	inference:
	parameters:
	max_length: 256
	num_beams: 10
	no_repeat_ngram_size: 5
	do_sample: True
	early_stopping: True
	---

	## 功能介绍

	T5Corrector：中文字音与字形纠错模型

	这个模型是基于mengzi-t5-base进行文本纠错训练，使用2kw+句子，通过替换同音词、近音词和形近字来，对于句中词组随机添加词组、删除词组中的部分字，以及字词乱序操作构造纠错平行语料，共计2亿+句对，累计训练66000步。

	<a href='https://github.com/Macielyoung/T5Corrector'>Github项目地址</a>



	加载模型：

	```python
	# 加载模型
	from transformers import AutoTokenizer, T5ForConditionalGeneration
	pretrained = "Maciel/T5Corrector-base-v2"
	tokenizer = AutoTokenizer.from_pretrained(pretrained)
	model = T5ForConditionalGeneration.from_pretrained(pretrained)
	```

	使用模型进行预测推理方法：
	```python
	import torch
	device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
	model.to(device)

	def correct(text, max_length):
	model_inputs = tokenizer(text,
	max_length=max_length,
	truncation=True,
	return_tensors="pt").to(device)
	output = model.generate(**model_inputs,
	num_beams=5,
	no_repeat_ngram_size=4,
	do_sample=True,
	early_stopping=True,
	max_length=max_length,
	return_dict_in_generate=True,
	output_scores=True)
	pred_output = tokenizer.batch_decode(output.sequences, skip_special_tokens=True)[0]
	return pred_output

	text = "贵州毛台现在多少钱一瓶啊，想买两瓶尝尝味道。"
	correction = correct(text, max_length=32)
	print(correction)
	```



	### 案例展示

	```
	示例1:
	input: 能不能帮我买点淇淋，好久没吃了。
	output: 能不能帮我买点冰淇淋，好久没吃了。

	示例2:
	input: 脑子有点胡涂了，这道题冥冥学过还没有做出来
	output: 脑子有点糊涂了,这道题明明学过还没有做出来

	示例3:
	input: 今天天气不太好，我的心情也不是很偷快
	output: 今天天气不太好,我的心情也不是很愉快
	```