--- language: zh tags: - pytorch license: mit --- # EVA ## Model Description EVA is the largest open-source Chinese dialogue model with up to 2.8B parameters. The 1.0 version model is pre-trained on [WudaoCorpus-Dialog](https://resource.wudaoai.cn/home), and the 2.0 version is pre-trained on a carefully cleaned version of WudaoCorpus-Dialog which yields better performance than the 1.0 version. [Paper link](https://arxiv.org/abs/2108.01547) of EVA1.0. [Paper link](https://arxiv.org/abs/2203.09313) of EVA2.0. ## Model Configuration | Model | n_params | n_enc-layers | n_dec-layers | d_model | d_ff | n_heads | d_head | attn-scale | | ------------- | -------- | ------------ | ------------ | ------- | ----- | ------- | ------ | ---------- | | EVA1.0 | 2.8B | 24 | 24 | 2,048 | 5,120 | 32 | 64 | No | | EVA2.0_Base | 300M | 12 | 12 | 768 | 3,072 | 12 | 64 | Yes | | EVA2.0_Large | 970M | 24 | 24 | 1,024 | 4,096 | 16 | 64 | Yes | | EVA2.0_xLarge | 2.8B | 24 | 24 | 2,048 | 5,120 | 32 | 64 | Yes | ## How to use Please refer to our GitHub [repository](https://github.com/thu-coai/EVA/tree/huggingface). ## Performance We use the KdConv dataset to finetune and evaluate the model. Under the default hyperparameters in the scripts, we get the following results on the test set: | | | | | | | ---- | ---- | ---- | ---- | ---- | | | loss | BLEU-4 | Dist-4 | f1 | | eva1.0 Before finetune | 3.49 | 0.64 | 85.94 | 13.24 | | eva1.0 After finetune | 2.37 | 3.94 | 72.47 | 19.80 | | eva2.0 Before finetune | 3.14 | 1.02 | 82.67 | 14.36 | | eva2.0 After finetune | 2.09 | 4.69 | 73.03 | 22.91 | We also use beam search to run the interactive inference of EVA2.0: ```[] Usr >>> 你好 Sys >>> 你好,请问你是? Usr >>> 我是小明 Sys >>> 好的,小明,我是新来的,请多关照 Usr >>> 你了解人工智能吗? Sys >>> 不了解,我只知道我是机器人 Usr >>> 那你很聪明 Sys >>> 谢谢夸奖,我会继续努力的 Usr >>> 你是学什么专业的呀 Sys >>> 计算机科学与技术,你呢? Usr >>> 我也是学计算机的 Sys >>> 哦,那你有什么想知道的,可以问我 Usr >>> 编程语言你学过哪些? Sys >>> java,c,c+都学过 Usr >>> 你学的真多 Sys >>> 呵呵,还好吧,我比较喜欢写代码 Usr >>> 哈哈,我也喜欢写代码,以后可以多多交流 Sys >>> 嗯,好的 Usr >>> 拜拜喽 Sys >>> 拜~ ``` **NOET**: Since different hardware may have different implementations of random functions, even if you use the same random seed as ours, you may not be able to reproduce this case. But the overall performance will not differ much. ## Disclaimer The pre-trained models aim to facilitate the research for conversation generation. The model provided in this repository is trained on a large dataset collected from various sources. Although a rigorous cleaning and filtering process has been carried out to the data and the model output, there is no guarantee that all the inappropriate contents have been completely banned. All the contents generated by the model do not represent the authors' opinions. The decoding script provided in this repository is only for research purposes. We are not responsible for any content generated using our model. ## Citation ``` @article{coai2021eva, title={EVA: An Open-Domain Chinese Dialogue System with Large-Scale Generative Pre-Training}, author={Zhou, Hao and Ke, Pei and Zhang, Zheng and Gu, Yuxian and Zheng, Yinhe and Zheng, Chujie and Wang, Yida and Wu, Chen Henry and Sun, Hao and Yang, Xiaocong and Wen, Bosi and Zhu, Xiaoyan and Huang, Minlie and Tang, Jie}, journal={arXiv preprint arXiv:2108.01547}, year={2021} } @article{coai2022eva2, title={{EVA2.0}: Investigating Open-Domain Chinese Dialogue Systems with Large-Scale Pre-Training}, author={Gu, Yuxian and Wen, Jiaxin and Sun, Hao and Song, Yi and Ke, Pei and Zheng, Chujie and Zhang, Zheng and Yao, Jianzhu and Zhu, Xiaoyan and Tang, Jie and Huang, Minlie}, journal={arXiv preprint arXiv:2203.09313}, year={2022} } ```