--- license: openrail language: - en pipeline_tag: conversational --- ### Description DialogGPT is a variant of the GPT (Generative Pretrained Transformer) language model developed by OpenAI. It's a deep neural network-based language model that's trained on massive amounts of text data to generate human-like text. DialogGPT uses the transformer architecture, which is a type of neural network designed for processing sequential data such as language. During the training phase, the model is exposed to a large corpus of text and learns to predict the next word in a sequence given the previous words. In the context of dialog, DialogGPT is trained to predict the response in a conversation, given the context of the conversation. This context can include one or more turns of the conversation, along with any additional information such as the topic of the conversation or the speaker's personality. At inference time, the model takes the current context of the conversation as input and generates a response. The response is generated by sampling from the model's predicted distribution over the vocabulary. Overall, DialogGPT provides a flexible and powerful solution for generating human-like text in a conversational context, allowing for the creation of a wide range of applications such as chatbots, conversational agents, and virtual assistants ## Parameters Model was trained for 40 epochs, using params as follows. ``` per_gpu_train_batch_size: int = 2 self.per_gpu_eval_batch_size: int = 2 self.gradient_accumulation_steps: int = 1 self.learning_rate: float = 5e-5 self.weight_decay: float = 0.0 self.adam_epsilon: float = 1e-8 self.max_grad_norm: int = 1.0 self.num_train_epochs: int = 20 self.max_steps: int = -1 self.warmup_steps: int = 0 self.logging_steps: int = 1000 self.save_steps: int = 3500 self.save_total_limit = None self.eval_all_checkpoints: bool = False self.no_cuda: bool = False self.overwrite_output_dir: bool = True self.overwrite_cache: bool = True self.should_continue: bool = False self.seed: int = 42 self.local_rank: int = -1 self.fp16: bool = False self.fp16_opt_level: str = 'O1' ``` ## Usage DialoGPT small version, finetuned on Harry Potter from Harry Potter and the Goblet of Fire. Simple snippet of how to infer of this model: ```python from transformers import AutoModelWithLMHead, AutoModelForCausalLM, AutoTokenizer tokenizer = AutoTokenizer.from_pretrained('s3nh/DialoGPT-small-harry-potter-goblet-of-fire') model = AutoModelWithLMHead.from_pretrained('s3nh/DialoGPT-small-harry-potter-goblet-of-fire') for step in range(4): new_user_input_ids = tokenizer.encode(input(">> User:") + tokenizer.eos_token, return_tensors='pt') bot_input_ids = torch.cat([chat_history_ids, new_user_input_ids], dim=-1) if step > 0 else new_user_input_ids chat_history_ids = model.generate( bot_input_ids, max_length=200, pad_token_id=tokenizer.eos_token_id, no_repeat_ngram_size=3, do_sample=True, top_k=100, top_p=0.7, temperature=0.8 ) print("HarryBot: {}".format(tokenizer.decode(chat_history_ids[:, bot_input_ids.shape[-1]:][0], skip_special_tokens=True)))