sayanbanerjee32 commited on
Commit
716f62a
·
verified ·
1 Parent(s): 569af06

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +17 -3
README.md CHANGED
@@ -1,3 +1,17 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ ---
4
+ ## Dataset
5
+ Collection of William Shakespeare plays
6
+ - tiktoken - gpt2 tokenizer is used for tokenization
7
+ - Number of total tokens - 338025
8
+
9
+ ## The HuggingFace Spaces Gradio App
10
+
11
+ The app is available [here](https://huggingface.co/spaces/sayanbanerjee32/nanogpt2_text_generator)
12
+
13
+ The App takes following as input
14
+ 1. Seed Text (Prompt) - This is provided as input text to the GPT model, based on which it generates further contents. If no data is provided, the only a space (" ") is provided as input
15
+ 2. Max tokens to generate - This controls the numbers of tokens it will generate. The default value is 100.
16
+ 3. Temperature - This accepts values between 0 to 1. Higher value introduces more randomness in the next token generation. Default value is set to 0.7.
17
+ 4. Select Top N in each step - This is an optional field. If no value is provided (or <= 0), all available tokens are considered for the next token prediction based on SoftMax probability. However, if a number is set then only that many top tokes will be considered for the next token prediction.