MinzaKhan commited on
Commit
5595ead
1 Parent(s): af63ac0

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +7 -1
README.md CHANGED
@@ -45,4 +45,10 @@ The Udying Fire - 52036
45
 
46
  The Red Room - 4618
47
 
48
- The total number of tokens in the corpus is 1043588
 
 
 
 
 
 
 
45
 
46
  The Red Room - 4618
47
 
48
+ The total number of tokens in the corpus is 1043588.
49
+
50
+ The corpus was created by downloading and combining 14 novels of the famous author H G Wells from Project Gutenberg. Most of these novels are science fiction novels, so this model has been trained to generate text of the science fiction genre. It produces text in the style of H G Wells.
51
+
52
+ The corpus consists of 14 novels written by H G Wells downloaded from Project Gutenberg. The text added by Project Gutenberg at the beginning and end of each novel were removed. Then the entire text in each novel
53
+ was converted into one line. Then the single line was broken into 20 parts. In this way 20 lines were generated for each novel. The lines from each novel were then combined and
54
+ stored in a single text file. This text file was then used to finetune the model.