yuyan-dialogue / README.md

席亚东

update readme

ef2abea about 2 years ago

4.64 kB

	---
	license: apache-2.0

	language: zh
	inference: false
	tags:
	- text-generation
	- dialogue-generation
	- pytorch
	- inference acceleration
	- gpt2
	- gpt3
	---
	# YuYan-Dialogue

	YuYan is a series of Chinese language models with different size, developed by Fuxi AI lab, Netease.Inc. They are trained on a large Chinese novel dataset of high quality.

	YuYan is in the same family of decoder-only models like [GPT2 and GPT-3](https://arxiv.org/abs/2005.14165). As such, it was pretrained using the self-supervised causal language modedling objective.

	YuYan-Dialogue is a dialogue model by fine-tuning the YuYan-11b on a large multi-turn dialogue dataset of high quality. It has very strong conversation generation capabilities.

	## Model Inference Acceleration

	As the model size increases, the model inference time increases and more computational resources are required.

	Therefore, we developed our own transformer model inference acceleration framework, [EET](https://github.com/NetEase-FuXi/EET.git). More details are in [Easy and Efficient Transformer: Scalable Inference Solution For Large NLP Model](https://aclanthology.org/2022.naacl-industry.8/).

	We combine our language model with the EET inference framework to provide industrial-grade inference reasoning performance.

	## How to use

	Our model is trained based on the [fairseq](https://github.com/facebookresearch/fairseq). As a result, the inference and finetuning depend on it.

	For inference, we modify some parts of the original fairseq codes. Mainly
	> fairseq-0.12.2/fairseq/sequence_generator.py

	We integrate the EET with sequence_generator. We replace the eos token to a token unlikely to be sampled to ensure the generated text length. The repetition penalty trick is also modified. You can change the penalty strength by adjusting the value of `self.ban_weight`.

	Then, to keep the eos token in the final generated text, we change the line 75 `include_eos=False` to `include_eos=True` in
	> fairseq-0.12.2/fairseq/data/dictionary.py

	Finally, to pass in parameters in python scripts, we remove the line 67 ~ line 69 in
	>fairseq-0.12.2/fairseq/dataclass/utils.py

	Below are the install tutorial.

	```
	# install pytorch
	pip install torch==1.8.1 # install pytorch

	# install fairseq
	unzip fairseq-0.12.2.zip
	cd fairseq-0.12.2
	pip install.

	# install EET
	git clone https://github.com/NetEase-FuXi/EET.git
	cd EET
	pip install .

	# install transformers (EET requirements)
	pip install transformers==4.23

	# make a folder, move the dictionary file and model file into it.
	mkdir transformer_lm_gpt2_xxl_dialogue
	mv dict.txt transformer_lm_gpt2_xxl_dialogue/
	mv checkpoint_best_part_*.pt transformer_lm_gpt2_xxl_dialogue/

	```
	`inference.py` is a script to provide a interface to initialize the EET object and sequence_generator. It includes some pre-process and post-process functions for text input and output. You can modify the script according to your needs.

	In addition, it provide a simple object to organize the dialogue generation and dialogue history.

	After the environment is ready, several lines of codes can realize the inference.

	``` python

	from inference import Inference
	model_path = "transformer_lm_gpt2_xxl_dialogue/checkpoint_best.pt"
	data_path = "transformer_lm_gpt2_xxl_dialogue"
	eet_batch_size = 10 # max inference batch size, adjust according to cuda memory, 40GB memory is necessary
	inference = Inference(model_path, data_path, eet_batch_size)
	dialogue_model = Dialogue(inference)
	dialogue_model.get_repsonse("你好啊")
	```
	## Citation
	If you find the technical report or resource is useful, please cite the following technical report in your paper.
	- https://aclanthology.org/2022.naacl-industry.8/
	```
	@inproceedings{li-etal-2022-easy,
	title = "Easy and Efficient Transformer: Scalable Inference Solution For Large {NLP} Model",
	author = "Li, Gongzheng and
	Xi, Yadong and
	Ding, Jingzhen and
	Wang, Duan and
	Luo, Ziyang and
	Zhang, Rongsheng and
	Liu, Bai and
	Fan, Changjie and
	Mao, Xiaoxi and
	Zhao, Zeng",
	booktitle = "Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Industry Track",
	month = jul,
	year = "2022",
	address = "Hybrid: Seattle, Washington + Online",
	publisher = "Association for Computational Linguistics",
	url = "https://aclanthology.org/2022.naacl-industry.8",
	doi = "10.18653/v1/2022.naacl-industry.8",
	pages = "62--68"
	}

	```
	## Contact Us
	You can also contact us by email:

	xiyadong@corp.netease.com, dingjingzhen@corp.netease