|
--- |
|
license: apache-2.0 |
|
|
|
language: zh |
|
inference: false |
|
tags: |
|
- text-generation |
|
- dialogue-generation |
|
- pytorch |
|
- inference acceleration |
|
- gpt2 |
|
- gpt3 |
|
--- |
|
# YuYan-Dialogue |
|
|
|
YuYan is a series of Chinese language models with different size, developed by Fuxi AI lab, Netease.Inc. They are trained on a large Chinese novel dataset of high quality. |
|
|
|
YuYan is in the same family of decoder-only models like [GPT2 and GPT-3](https://arxiv.org/abs/2005.14165). As such, it was pretrained using the self-supervised causal language modedling objective. |
|
|
|
YuYan-Dialogue is a dialogue model by fine-tuning the YuYan-11b on a large multi-turn dialogue dataset of high quality. It has very strong conversation generation capabilities. |
|
|
|
## Model Inference Acceleration |
|
|
|
As the model size increases, the model inference time increases and more computational resources are required. |
|
|
|
Therefore, we developed our own transformer model inference acceleration framework, [EET](https://github.com/NetEase-FuXi/EET.git). More details are in [Easy and Efficient Transformer: Scalable Inference Solution For Large NLP Model](https://aclanthology.org/2022.naacl-industry.8/). |
|
|
|
We combine our language model with the EET inference framework to provide industrial-grade inference reasoning performance. |
|
|
|
## How to use |
|
|
|
Our model is trained based on the [fairseq](https://github.com/facebookresearch/fairseq). As a result, the inference and finetuning depend on it. |
|
|
|
For inference, we modify some parts of the original fairseq codes. Mainly |
|
> fairseq-0.12.2/fairseq/sequence_generator.py |
|
|
|
We integrate the EET with sequence_generator. We replace the eos token to a token unlikely to be sampled to ensure the generated text length. The repetition penalty trick is also modified. You can change the penalty strength by adjusting the value of `self.ban_weight`. |
|
|
|
Then, to keep the eos token in the final generated text, we change the line 75 `include_eos=False` to `include_eos=True` in |
|
> fairseq-0.12.2/fairseq/data/dictionary.py |
|
|
|
Finally, to pass in parameters in python scripts, we remove the line 67 ~ line 69 in |
|
>fairseq-0.12.2/fairseq/dataclass/utils.py |
|
|
|
Below are the install tutorial. |
|
|
|
``` |
|
# install pytorch |
|
pip install torch==1.8.1 # install pytorch |
|
|
|
# install fairseq |
|
unzip fairseq-0.12.2.zip |
|
cd fairseq-0.12.2 |
|
pip install. |
|
|
|
# install EET |
|
git clone https://github.com/NetEase-FuXi/EET.git |
|
cd EET |
|
pip install . |
|
|
|
# install transformers (EET requirements) |
|
pip install transformers==4.23 |
|
|
|
# make a folder, move the dictionary file and model file into it. |
|
mkdir transformer_lm_gpt2_xxl_dialogue |
|
mv dict.txt transformer_lm_gpt2_xxl_dialogue/ |
|
mv checkpoint_best_part_*.pt transformer_lm_gpt2_xxl_dialogue/ |
|
|
|
``` |
|
`inference.py` is a script to provide a interface to initialize the EET object and sequence_generator. It includes some pre-process and post-process functions for text input and output. You can modify the script according to your needs. |
|
|
|
In addition, it provide a simple object to organize the dialogue generation and dialogue history. |
|
|
|
After the environment is ready, several lines of codes can realize the inference. |
|
|
|
``` python |
|
|
|
from inference import Inference |
|
model_path = "transformer_lm_gpt2_xxl_dialogue/checkpoint_best.pt" |
|
data_path = "transformer_lm_gpt2_xxl_dialogue" |
|
eet_batch_size = 10 # max inference batch size, adjust according to cuda memory, 40GB memory is necessary |
|
inference = Inference(model_path, data_path, eet_batch_size) |
|
dialogue_model = Dialogue(inference) |
|
dialogue_model.get_repsonse("你好啊") |
|
``` |
|
## Citation |
|
If you find the technical report or resource is useful, please cite the following technical report in your paper. |
|
- https://aclanthology.org/2022.naacl-industry.8/ |
|
``` |
|
@inproceedings{li-etal-2022-easy, |
|
title = "Easy and Efficient Transformer: Scalable Inference Solution For Large {NLP} Model", |
|
author = "Li, Gongzheng and |
|
Xi, Yadong and |
|
Ding, Jingzhen and |
|
Wang, Duan and |
|
Luo, Ziyang and |
|
Zhang, Rongsheng and |
|
Liu, Bai and |
|
Fan, Changjie and |
|
Mao, Xiaoxi and |
|
Zhao, Zeng", |
|
booktitle = "Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Industry Track", |
|
month = jul, |
|
year = "2022", |
|
address = "Hybrid: Seattle, Washington + Online", |
|
publisher = "Association for Computational Linguistics", |
|
url = "https://aclanthology.org/2022.naacl-industry.8", |
|
doi = "10.18653/v1/2022.naacl-industry.8", |
|
pages = "62--68" |
|
} |
|
|
|
``` |
|
## Contact Us |
|
You can also contact us by email: |
|
|
|
xiyadong@corp.netease.com, dingjingzhen@corp.netease |
|
|