first commit

Files changed (3) hide show

README.md +118 -0
dict.txt +0 -0
inference.py +181 -0

README.md CHANGED Viewed

@@ -1,3 +1,121 @@
 ---
 license: apache-2.0
 ---

 ---
 license: apache-2.0
+language: zh
+inference: false
+tags:
+- text-generation
+- story-generation
+- pytorch
+- inference acceleration
+- gpt2
+- gpt3
 ---
+# YuYan: Pre-training of Language Models for Story Generation
+YuYan is a series of Chinese language models with different size, developed by Fuxi AI lab, Netease.Inc. They are trained on a large Chinese novel dataset of high quality.
+YuYan is in the same family of decoder-only models like [GPT2 and GPT-3](https://arxiv.org/abs/2005.14165). As such, it was pretrained using the self-supervised causal language modedling objective.
+Because the training data is mainly the novel, the model is good at generating the next plot given the story context.
+## Model Inference Acceleration
+As the model size increases, the model inference time increases and more computational resources are required.
+Therefore, we developed our own transformer model inference acceleration framework, [EET](https://github.com/NetEase-FuXi/EET.git). More details are in [Easy and Efficient Transformer: Scalable Inference Solution For Large NLP Model](https://aclanthology.org/2022.naacl-industry.8/).
+We combine our language model with the EET inference framework to provide industrial-grade inference reasoning performance.
+## How to use
+Our model is trained based on the [fairseq](https://github.com/facebookresearch/fairseq). As a result, the inference and finetuning depend on it.
+For inference, we modify some parts of the original fairseq codes. Mainly
+> fairseq-0.12.2/fairseq/sequence_generator.py
+We integrate the EET with sequence_generator. We replace the eos token to a token unlikely to be sampled to ensure the generated text length. The repetition penalty trick is also modified. You can change the penalty strength by adjusting the value of `self.ban_weight`.
+Then, to keep the eos token in the final generated text, we change the line 75 `include_eos=False` to `include_eos=True` in
+> fairseq-0.12.2/fairseq/data/dictionary.py
+Finally, to pass in parameters in python scripts, we remove the line 67 ~ line 69 in
+>fairseq-0.12.2/fairseq/dataclass/utils.py
+Below are the install tutorial.
+```
+# install pytorch
+pip install torch==1.8.1 # install pytorch
+# install fairseq
+unzip fairseq-0.12.2.zip
+cd fairseq-0.12.2
+pip  install.
+# install EET
+git clone https://github.com/NetEase-FuXi/EET.git
+cd EET
+pip install .
+# install transformers (EET requirements)
+pip install transformers==4.23
+# make a folder, move the dictionary file and model file into it.
+mkdir transformer_lm_gpt2_xxl
+mv dict.txt transformer_lm_gpt2_xxl/
+mv checkpoint_best_part_*.pt transformer_lm_gpt2_xxl/
+```
+`inference.py` is a script to provide a interface to initialize the EET object and sequence_generator. In addition, It includes some pre-process and post-process functions for text input and output. You can modify the script according to your needs.
+After the environment is ready, several lines of codes can realize the inference.
+``` python
+from inference import Inference
+model_path = "transformer_lm_gpt2_xxl/checkpoint_best.pt"
+data_path = "transformer_lm_gpt2_xxl"
+eet_batch_size = 10  # max inference batch size, adjust according to cuda memory, 40GB memory is necessary
+inference = Inference(model_path, data_path, eet_batch_size)
+inp = "田园一听这话，轻挑的嘴角放了下来，两腿叉开，踱着方步，跨过汤婆子，一屁股坐在了老人面前。</s>刘萌和健军一左一右站在他身旁，像是王朝、马汉护着包公断案。"
+text = inference([inp] * 10, append_right_eos=True)
+```
+This interface supports batch inputs, so if you need to generate multiple results for one input, you can copy the input multiple times. The interface supports results generated for multiple different inputs, e.g.
+```python
+text = inference(["四个月后，正是草长花秾的暮春季节。</s>令狐冲和盈盈新婚燕尔，携手共赴华山。","院子中传来急促的脚步声，他停下手中的招式，将开元刀插入刀鞘。"])
+```
+## Citation
+If you find the technical report or resource is useful, please cite the following technical report in your paper.
+- https://aclanthology.org/2022.naacl-industry.8/
+```
+@inproceedings{li-etal-2022-easy,
+    title = "Easy and Efficient Transformer: Scalable Inference Solution For Large {NLP} Model",
+    author = "Li, Gongzheng  and
+      Xi, Yadong  and
+      Ding, Jingzhen  and
+      Wang, Duan  and
+      Luo, Ziyang  and
+      Zhang, Rongsheng  and
+      Liu, Bai  and
+      Fan, Changjie  and
+      Mao, Xiaoxi  and
+      Zhao, Zeng",
+    booktitle = "Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Industry Track",
+    month = jul,
+    year = "2022",
+    address = "Hybrid: Seattle, Washington + Online",
+    publisher = "Association for Computational Linguistics",
+    url = "https://aclanthology.org/2022.naacl-industry.8",
+    doi = "10.18653/v1/2022.naacl-industry.8",
+    pages = "62--68"
+}
+```
+## Contact Us
+You can also contact us by email:
+xiyadong@corp.netease.com, dingjingzhen@corp.netease

dict.txt ADDED Viewed

The diff for this file is too large to render. See raw diff

inference.py ADDED Viewed

	@@ -0,0 +1,181 @@

+#!/usr/bin/env python3 -u
+from collections import namedtuple
+import math
+import torch
+from torch.nn.utils.rnn import pad_sequence
+from fairseq import options, tasks, utils
+from eet.fairseq.transformer import EETTransformerDecoder
+Batch = namedtuple('Batch', 'ids src_tokens src_lengths')
+def make_batches(lines, task, max_positions, encode_fn):
+    tokens = [task.source_dictionary.encode_line(encode_fn(line),
+                                                 add_if_not_exist=False,
+                                                 append_eos=False,
+                                                 reverse_order=True).long()
+              for line in lines]
+    lengths = [t.numel() for t in tokens]
+    tokens = pad_sequence(tokens, batch_first=True,
+                          padding_value=1).flip(dims=(1,))
+    return Batch(ids=torch.arange(len(tokens)),
+                 src_tokens=tokens,
+                 src_lengths=torch.tensor(lengths))
+def encode_fn(x_str):
+    x_str = x_str.replace(" ", "")
+    x_str = x_str.split("</s>")
+    x_str = " </s> ".join([" ".join(list(x)) for x in x_str])
+    x_str = "</s> " + x_str
+    return x_str
+def decode_fn(x):
+    x = x.replace(" ", "")
+    return x
+def eos_token_filter(sent):
+    if "</s>" in sent:
+        return True
+    return False
+def post_precess(line):
+    line = "</s>".join(line.split("</s>")[:-1])
+    return line
+class Inference(object):
+    def __init__(self, model_path, data_path, eet_batch_size):
+        parser = options.get_generation_parser(
+            default_task="language_modeling")
+        args = options.parse_args_and_arch(parser)
+        args.data = data_path
+        args.path = model_path
+        self.args = args
+        # generate parameter
+        args.beam = 1  # don't change
+        args.min_len = 5
+        args.max_len_b = 200
+        args.lenpen = 1.0
+        args.sampling = True
+        args.sampling_topp = 0.8
+        # args.sampling_topk = 20
+        args.temperature = 0.8
+        args.no_repeat_ngram_size = 1
+        args.fp16 = True
+        # Setup task, e.g., translation
+        task = tasks.setup_task(args)
+        self.task = task
+        # Set dictionaries
+        self.src_dict = task.source_dictionary
+        self.tgt_dict = task.target_dictionary
+        use_cuda = torch.cuda.is_available() and not args.cpu
+        self.use_cuda = use_cuda
+        model_path = args.path
+        checkpoint = torch.load(model_path.replace("best.pt", "best_part_1.pt"))
+        checkpoint["model"].update(model_path.replace("best.pt", "best_part_2.pt"))
+        checkpoint["model"].update(model_path.replace("best.pt", "best_part_3.pt"))
+        torch.save(checkpoint, model_path)
+        # load part 1
+        state = torch.load(args.path, map_location=torch.device("cpu"))
+        cfg_args = eval(str(state["cfg"]))["model"]
+        del cfg_args["_name"]
+        keys_list = []
+        values_list = []
+        for key, value in cfg_args.items():
+            keys_list.append(key)
+            values_list.append(value)
+        Model_args = namedtuple("Model_args", keys_list)
+        model_args = Model_args._make(values_list)
+        del state
+        eet_seq_len = 1024  # max sequence length, (input length + generation length) shouldn't be larger than this
+        eet_batch_size = eet_batch_size
+        data_type = torch.float16
+        eet_config = {"data_type": data_type,
+                      "max_batch": eet_batch_size,
+                      "full_seq_len": eet_seq_len}
+        print(model_args)
+        eet_model = EETTransformerDecoder.from_fairseq_pretrained(model_id_or_path=args.path,
+                                                                  dictionary=self.src_dict, args=model_args,
+                                                                  config=eet_config,
+                                                                  no_encoder_attn=True)
+        self.models = [eet_model]
+        # Initialize generator
+        self.generator = task.build_generator(self.models, args)
+        # Load alignment dictionary for unknown word replacement
+        # (None if no unknown word replacement, empty if no path to align dictionary)
+        self.align_dict = utils.load_align_dict(args.replace_unk)
+        self.max_positions = 1024  # the model config
+        self.eos_index = self.tgt_dict.eos()
+        self.pad_index = self.tgt_dict.pad()
+    def __call__(self, inputs, append_right_eos=True):
+        results = []
+        start_id = 0
+        batch = make_batches(inputs, self.task, self.max_positions, encode_fn)
+        inputs_str = inputs
+        src_tokens = batch.src_tokens
+        src_lengths = batch.src_lengths
+        # a new paragraph always
+        if src_tokens[0][-1].item() != self.eos_index and append_right_eos:
+            src_tokens = torch.cat([src_tokens, src_tokens.new_ones(
+                src_tokens.size(0), 1) * self.eos_index], dim=1)
+            src_lengths += 1
+        if self.use_cuda:
+            src_tokens = src_tokens.cuda()
+            src_lengths = src_lengths.cuda()
+        sample = {
+            'net_input': {
+                'src_tokens': src_tokens,
+                'src_lengths': src_lengths,
+            },
+        }
+        translations = self.task.inference_step(
+            self.generator, self.models, sample)
+        for i, (id, hypos) in enumerate(zip(batch.ids.tolist(), translations)):
+            results.append((start_id + id, src_tokens[i], hypos))
+        # sort output to match input order
+        final_results = []
+        for id, src_tokens, hypos in sorted(results, key=lambda x: x[0]):
+            # Process top predictions
+            tmp_res = []
+            for hypo in hypos[:min(len(hypos), self.args.nbest)]:
+                hypo_tokens, hypo_str, alignment = utils.post_process_prediction(
+                    hypo_tokens=hypo['tokens'].int().cpu()[
+                        len(src_tokens) - 1:],
+                    src_str=None,
+                    alignment=hypo['alignment'],
+                    align_dict=self.align_dict,
+                    tgt_dict=self.tgt_dict)
+                detok_hypo_str = decode_fn(hypo_str)
+                if eos_token_filter(detok_hypo_str):
+                    detok_hypo_str = post_precess(detok_hypo_str)
+                    score = hypo['score'] / math.log(2)  # convert to base 2
+                    tmp_res.append([detok_hypo_str, score])
+            final_results.append(tmp_res)
+        return final_results