s3nh commited on
Commit
94ef744
1 Parent(s): a63a82d

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +81 -0
README.md ADDED
@@ -0,0 +1,81 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: openrail
3
+ language:
4
+ - en
5
+ pipeline_tag: conversational
6
+ ---
7
+
8
+ <img src = 'https://images.unsplash.com/photo-1570610159825-ec5d3823660c?ixlib=rb-4.0.3&ixid=MnwxMjA3fDB8MHxwaG90by1wYWdlfHx8fGVufDB8fHx8&auto=format&fit=crop&w=1633&q=80'>
9
+
10
+ ### Description
11
+
12
+ DialogGPT is a variant of the GPT (Generative Pretrained Transformer) language model developed by OpenAI. It's a deep neural network-based language model that's trained on massive amounts of text data to generate human-like text.
13
+
14
+ DialogGPT uses the transformer architecture, which is a type of neural network designed for processing sequential data such as language. During the training phase, the model is exposed to a large corpus of text and learns to predict the next word in a sequence given the previous words.
15
+
16
+ In the context of dialog, DialogGPT is trained to predict the response in a conversation, given the context of the conversation. This context can include one or more turns of the conversation, along with any additional information such as the topic of the conversation or the speaker's personality.
17
+
18
+ At inference time, the model takes the current context of the conversation as input and generates a response. The response is generated by sampling from the model's predicted distribution over the vocabulary.
19
+
20
+ Overall, DialogGPT provides a flexible and powerful solution for generating human-like text in a conversational context, allowing for the creation of a wide range of applications such as chatbots, conversational agents, and virtual assistants
21
+
22
+ ## Parameters
23
+
24
+ Model was trained for 40 epochs, using params as follows.
25
+
26
+ ```
27
+ per_gpu_train_batch_size: int = 2
28
+ self.per_gpu_eval_batch_size: int = 2
29
+ self.gradient_accumulation_steps: int = 1
30
+ self.learning_rate: float = 5e-5
31
+ self.weight_decay: float = 0.0
32
+ self.adam_epsilon: float = 1e-8
33
+ self.max_grad_norm: int = 1.0
34
+ self.num_train_epochs: int = 20
35
+ self.max_steps: int = -1
36
+ self.warmup_steps: int = 0
37
+ self.logging_steps: int = 1000
38
+ self.save_steps: int = 3500
39
+ self.save_total_limit = None
40
+ self.eval_all_checkpoints: bool = False
41
+ self.no_cuda: bool = False
42
+ self.overwrite_output_dir: bool = True
43
+ self.overwrite_cache: bool = True
44
+ self.should_continue: bool = False
45
+ self.seed: int = 42
46
+ self.local_rank: int = -1
47
+ self.fp16: bool = False
48
+ self.fp16_opt_level: str = 'O1'
49
+ ```
50
+
51
+ ## Usage
52
+
53
+
54
+ DialoGPT small version, finetuned on Harry Potter from Harry Potter and the Goblet of Fire.
55
+
56
+ Simple snippet of how to infer of this model:
57
+
58
+
59
+ ```python
60
+ from transformers import AutoModelWithLMHead, AutoModelForCausalLM, AutoTokenizer
61
+
62
+ tokenizer = AutoTokenizer.from_pretrained('s3nh/DialoGPT-small-harry-potter-goblet-of-fire')
63
+ model = AutoModelWithLMHead.from_pretrained('s3nh/DialoGPT-small-harry-potter-goblet-of-fire')
64
+
65
+ for step in range(4):
66
+ new_user_input_ids = tokenizer.encode(input(">> User:") + tokenizer.eos_token, return_tensors='pt')
67
+
68
+ bot_input_ids = torch.cat([chat_history_ids, new_user_input_ids], dim=-1) if step > 0 else new_user_input_ids
69
+
70
+ chat_history_ids = model.generate(
71
+ bot_input_ids, max_length=200,
72
+ pad_token_id=tokenizer.eos_token_id,
73
+ no_repeat_ngram_size=3,
74
+ do_sample=True,
75
+ top_k=100,
76
+ top_p=0.7,
77
+ temperature=0.8
78
+ )
79
+
80
+ print("HarryBot: {}".format(tokenizer.decode(chat_history_ids[:, bot_input_ids.shape[-1]:][0], skip_special_tokens=True)))
81
+