k4d3
/

English
Not-For-All-Audiences
k4d3 commited on
Commit
84f3a3d
β€’
1 Parent(s): 1fb58ab

Signed-off-by: Balazs Horvath <acsipont@gmail.com>

dataset_tools/Count Tokens in Sample Prompts.ipynb CHANGED
@@ -1,10 +1,181 @@
1
  {
2
  "cells": [
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3
  {
4
  "cell_type": "code",
5
- "execution_count": 3,
6
  "metadata": {},
7
  "outputs": [
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8
  {
9
  "data": {
10
  "text/html": [
@@ -193,11 +364,69 @@
193
  {
194
  "data": {
195
  "text/html": [
196
- "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"><span style=\"font-weight: bold\">Total number of prompts in realistic-sample-prompts.txt: </span><span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">6</span>\n",
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
197
  "</pre>\n"
198
  ],
199
  "text/plain": [
200
- "\u001b[1mTotal number of prompts in realistic-sample-prompts.txt: \u001b[0m\u001b[1;36m6\u001b[0m\n"
201
  ]
202
  },
203
  "metadata": {},
@@ -210,7 +439,6 @@
210
  "from rich.console import Console\n",
211
  "from rich.table import Table\n",
212
  "\n",
213
- "\n",
214
  "def count_tokens(text):\n",
215
  " enc = tiktoken.get_encoding(\"cl100k_base\")\n",
216
  " tokens = enc.encode(text)\n",
 
1
  {
2
  "cells": [
3
+ {
4
+ "cell_type": "markdown",
5
+ "metadata": {},
6
+ "source": [
7
+ "# Count Tokens in Sample Prompts\n",
8
+ "\n",
9
+ "---\n",
10
+ "\n",
11
+ "This script is a utility for analyzing prompts in text files, counting the tokens for each sample prompt, and displaying the results in a visually appealing table format using the Rich library. It also provides a warning if the positive prompt's token count exceeds a certain threshold (77 in this case).\n",
12
+ "\n",
13
+ "This script is designed to process text files containing positive and negative prompts, count the number of tokens for each prompt, and display the results in a tabular format using the Rich library.\n",
14
+ "\n",
15
+ "1. It imports the necessary libraries: `os` for file and directory operations, `tiktoken` for encoding and counting tokens, and `rich.console` and `rich.table` for creating a console interface and a table for displaying the results.\n",
16
+ "\n",
17
+ "2. The `count_tokens` function takes a text input and returns the number of tokens in that text using the `tiktoken` library's `cl100k_base` encoding.\n",
18
+ "\n",
19
+ "3. The script creates a `Console` object from the `rich` library.\n",
20
+ "\n",
21
+ "4. It iterates through all files in the `E:\\training_dir` directory that end with `-sample-prompts.txt`.\n",
22
+ "\n",
23
+ "5. For each file, it opens the file and reads its contents line by line.\n",
24
+ "\n",
25
+ "6. Each line is expected to be in the format `<positive_prompt> --n <negative_prompt> --<additional_arguments>`. The script splits the line at `--n` to separate the positive and negative prompts. As it works in \n",
26
+ "\n",
27
+ "7. It counts the number of tokens for both the positive and negative prompts using the `count_tokens` function.\n",
28
+ "\n",
29
+ "8. A `Table` object from the `rich` library is created, and the positive and negative prompts, along with their token counts, are added to the table as separate rows. The positive prompts are displayed in green, and the negative prompts are displayed in red.\n",
30
+ "\n",
31
+ "9. The table is printed to the console using the `Console.print` method.\n",
32
+ "\n",
33
+ "10. If the positive prompt's token count exceeds 77 (75 plus a buffer of 2), a warning message is printed in bold red.\n",
34
+ "\n",
35
+ "11. The script keeps track of the total number of prompts processed in the current file and prints it at the end."
36
+ ]
37
+ },
38
  {
39
  "cell_type": "code",
40
+ "execution_count": 9,
41
  "metadata": {},
42
  "outputs": [
43
+ {
44
+ "data": {
45
+ "text/html": [
46
+ "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\">Processing file: E:\\training_dir\\furrysticker-sample-prompts.txt\n",
47
+ "</pre>\n"
48
+ ],
49
+ "text/plain": [
50
+ "Processing file: E:\\training_dir\\furrysticker-sample-prompts.txt\n"
51
+ ]
52
+ },
53
+ "metadata": {},
54
+ "output_type": "display_data"
55
+ },
56
+ {
57
+ "data": {
58
+ "text/html": [
59
+ "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\">┏━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┓\n",
60
+ "┃<span style=\"font-weight: bold\"> Prompt Type </span>┃<span style=\"font-weight: bold\"> Prompt </span>┃<span style=\"font-weight: bold\"> Token Count </span>┃\n",
61
+ "┑━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━┩\n",
62
+ "β”‚ <span style=\"color: #008000; text-decoration-color: #008000\">Positive</span> β”‚ score_9, score_8_up, score_7_up, score_6_up, rating_safe, source_furry, solo, β”‚ 77 β”‚\n",
63
+ "β”‚ β”‚ anthro male magic user wolf, purple wizard hat, purple wizard robe, green eyes, β”‚ β”‚\n",
64
+ "β”‚ β”‚ humanoid hands, claws, pointing up with one hand, \\(white fur:1.4\\), furry sticker, β”‚ β”‚\n",
65
+ "β”‚ β”‚ simple background, black background, white outline β”‚ β”‚\n",
66
+ "β”‚ <span style=\"color: #800000; text-decoration-color: #800000\">Negative</span> β”‚ low quality, worst quality, blurry, sticker β”‚ 9 β”‚\n",
67
+ "β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€οΏ½οΏ½οΏ½β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜\n",
68
+ "</pre>\n"
69
+ ],
70
+ "text/plain": [
71
+ "┏━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┓\n",
72
+ "┃\u001b[1m \u001b[0m\u001b[1mPrompt Type\u001b[0m\u001b[1m \u001b[0m┃\u001b[1m \u001b[0m\u001b[1mPrompt \u001b[0m\u001b[1m \u001b[0m┃\u001b[1m \u001b[0m\u001b[1mToken Count\u001b[0m\u001b[1m \u001b[0m┃\n",
73
+ "┑━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━┩\n",
74
+ "β”‚ \u001b[32mPositive\u001b[0m β”‚ score_9, score_8_up, score_7_up, score_6_up, rating_safe, source_furry, solo, β”‚ 77 β”‚\n",
75
+ "β”‚ β”‚ anthro male magic user wolf, purple wizard hat, purple wizard robe, green eyes, β”‚ β”‚\n",
76
+ "β”‚ β”‚ humanoid hands, claws, pointing up with one hand, \\(white fur:1.4\\), furry sticker, β”‚ β”‚\n",
77
+ "β”‚ β”‚ simple background, black background, white outline β”‚ β”‚\n",
78
+ "β”‚ \u001b[31mNegative\u001b[0m β”‚ low quality, worst quality, blurry, sticker β”‚ 9 β”‚\n",
79
+ "β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜\n"
80
+ ]
81
+ },
82
+ "metadata": {},
83
+ "output_type": "display_data"
84
+ },
85
+ {
86
+ "data": {
87
+ "text/html": [
88
+ "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\">┏━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┓\n",
89
+ "┃<span style=\"font-weight: bold\"> Prompt Type </span>┃<span style=\"font-weight: bold\"> Prompt </span>┃<span style=\"font-weight: bold\"> Token Count </span>┃\n",
90
+ "┑━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━┩\n",
91
+ "β”‚ <span style=\"color: #008000; text-decoration-color: #008000\">Positive</span> β”‚ score_9, score_8_up, score_7_up, score_6_up, rating_safe, source_furry, solo, β”‚ 62 β”‚\n",
92
+ "β”‚ β”‚ anthro male fox, red fur, he has blue eyes with a stark gaze, dialogue bubble, text β”‚ β”‚\n",
93
+ "β”‚ β”‚ box, furry sticker, simple background, black background, white outline β”‚ β”‚\n",
94
+ "β”‚ <span style=\"color: #800000; text-decoration-color: #800000\">Negative</span> β”‚ low quality, worst quality, blurry, sticker β”‚ 9 β”‚\n",
95
+ "β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜\n",
96
+ "</pre>\n"
97
+ ],
98
+ "text/plain": [
99
+ "┏━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━���━━━━━━━┓\n",
100
+ "┃\u001b[1m \u001b[0m\u001b[1mPrompt Type\u001b[0m\u001b[1m \u001b[0m┃\u001b[1m \u001b[0m\u001b[1mPrompt \u001b[0m\u001b[1m \u001b[0m┃\u001b[1m \u001b[0m\u001b[1mToken Count\u001b[0m\u001b[1m \u001b[0m┃\n",
101
+ "┑━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━┩\n",
102
+ "β”‚ \u001b[32mPositive\u001b[0m β”‚ score_9, score_8_up, score_7_up, score_6_up, rating_safe, source_furry, solo, β”‚ 62 β”‚\n",
103
+ "β”‚ β”‚ anthro male fox, red fur, he has blue eyes with a stark gaze, dialogue bubble, text β”‚ β”‚\n",
104
+ "β”‚ β”‚ box, furry sticker, simple background, black background, white outline β”‚ β”‚\n",
105
+ "β”‚ \u001b[31mNegative\u001b[0m β”‚ low quality, worst quality, blurry, sticker β”‚ 9 β”‚\n",
106
+ "β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜\n"
107
+ ]
108
+ },
109
+ "metadata": {},
110
+ "output_type": "display_data"
111
+ },
112
+ {
113
+ "data": {
114
+ "text/html": [
115
+ "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\">┏━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┓\n",
116
+ "┃<span style=\"font-weight: bold\"> Prompt Type </span>┃<span style=\"font-weight: bold\"> Prompt </span>┃<span style=\"font-weight: bold\"> Token Count </span>┃\n",
117
+ "┑━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━┩\n",
118
+ "β”‚ <span style=\"color: #008000; text-decoration-color: #008000\">Positive</span> β”‚ score_9, score_8_up, score_7_up, score_6_up, rating_safe, source_furry, solo, β”‚ 50 β”‚\n",
119
+ "β”‚ β”‚ anthro female red panda, she has amber eyes, furry sticker, simple background, β”‚ β”‚\n",
120
+ "β”‚ β”‚ black background, white outline β”‚ β”‚\n",
121
+ "β”‚ <span style=\"color: #800000; text-decoration-color: #800000\">Negative</span> β”‚ low quality, worst quality, blurry, sticker β”‚ 9 β”‚\n",
122
+ "β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜\n",
123
+ "</pre>\n"
124
+ ],
125
+ "text/plain": [
126
+ "┏━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┓\n",
127
+ "┃\u001b[1m \u001b[0m\u001b[1mPrompt Type\u001b[0m\u001b[1m \u001b[0m┃\u001b[1m \u001b[0m\u001b[1mPrompt \u001b[0m\u001b[1m \u001b[0m┃\u001b[1m \u001b[0m\u001b[1mToken Count\u001b[0m\u001b[1m \u001b[0m┃\n",
128
+ "┑━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━┩\n",
129
+ "β”‚ \u001b[32mPositive\u001b[0m β”‚ score_9, score_8_up, score_7_up, score_6_up, rating_safe, source_furry, solo, β”‚ 50 β”‚\n",
130
+ "β”‚ β”‚ anthro female red panda, she has amber eyes, furry sticker, simple background, β”‚ β”‚\n",
131
+ "β”‚ β”‚ black background, white outline β”‚ β”‚\n",
132
+ "β”‚ \u001b[31mNegative\u001b[0m β”‚ low quality, worst quality, blurry, sticker β”‚ 9 β”‚\n",
133
+ "β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜\n"
134
+ ]
135
+ },
136
+ "metadata": {},
137
+ "output_type": "display_data"
138
+ },
139
+ {
140
+ "data": {
141
+ "text/html": [
142
+ "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\">┏━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┓\n",
143
+ "┃<span style=\"font-weight: bold\"> Prompt Type </span>┃<span style=\"font-weight: bold\"> Prompt </span>┃<span style=\"font-weight: bold\"> Token Count </span>┃\n",
144
+ "┑━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━┩\n",
145
+ "β”‚ <span style=\"color: #008000; text-decoration-color: #008000\">Positive</span> β”‚ score_9, score_8_up, score_7_up, score_6_up, rating_safe, source_furry, solo, β”‚ 62 β”‚\n",
146
+ "β”‚ β”‚ anthro female lizard, blue scales, white body, scalie, long white claws, she has β”‚ β”‚\n",
147
+ "β”‚ β”‚ yellow eyes, furry sticker, simple background, black background, white outline β”‚ β”‚\n",
148
+ "β”‚ <span style=\"color: #800000; text-decoration-color: #800000\">Negative</span> β”‚ low quality, worst quality, blurry, sticker β”‚ 9 β”‚\n",
149
+ "β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜\n",
150
+ "</pre>\n"
151
+ ],
152
+ "text/plain": [
153
+ "┏━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┓\n",
154
+ "┃\u001b[1m \u001b[0m\u001b[1mPrompt Type\u001b[0m\u001b[1m \u001b[0m┃\u001b[1m \u001b[0m\u001b[1mPrompt \u001b[0m\u001b[1m \u001b[0m┃\u001b[1m \u001b[0m\u001b[1mToken Count\u001b[0m\u001b[1m \u001b[0m┃\n",
155
+ "┑━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━┩\n",
156
+ "β”‚ \u001b[32mPositive\u001b[0m β”‚ score_9, score_8_up, score_7_up, score_6_up, rating_safe, source_furry, solo, β”‚ 62 β”‚\n",
157
+ "β”‚ β”‚ anthro female lizard, blue scales, white body, scalie, long white claws, she has β”‚ β”‚\n",
158
+ "β”‚ β”‚ yellow eyes, furry sticker, simple background, black background, white outline β”‚ β”‚\n",
159
+ "β”‚ \u001b[31mNegative\u001b[0m β”‚ low quality, worst quality, blurry, sticker β”‚ 9 β”‚\n",
160
+ "β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜\n"
161
+ ]
162
+ },
163
+ "metadata": {},
164
+ "output_type": "display_data"
165
+ },
166
+ {
167
+ "data": {
168
+ "text/html": [
169
+ "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"><span style=\"font-weight: bold\">Total number of prompts in furrysticker-sample-prompts.txt: </span><span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">4</span>\n",
170
+ "</pre>\n"
171
+ ],
172
+ "text/plain": [
173
+ "\u001b[1mTotal number of prompts in furrysticker-sample-prompts.txt: \u001b[0m\u001b[1;36m4\u001b[0m\n"
174
+ ]
175
+ },
176
+ "metadata": {},
177
+ "output_type": "display_data"
178
+ },
179
  {
180
  "data": {
181
  "text/html": [
 
364
  {
365
  "data": {
366
  "text/html": [
367
+ "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\">┏━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┓\n",
368
+ "┃<span style=\"font-weight: bold\"> Prompt Type </span>┃<span style=\"font-weight: bold\"> Prompt </span>┃<span style=\"font-weight: bold\"> Token Count </span>┃\n",
369
+ "┑━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━┩\n",
370
+ "β”‚ <span style=\"color: #008000; text-decoration-color: #008000\">Positive</span> β”‚ score_9, score_8_up, score_7_up, score_6_up, rating_safe, source_furry, portrait, β”‚ 65 β”‚\n",
371
+ "β”‚ β”‚ anthro male dragon, scale iridescence, detailed background, amazing_background, β”‚ β”‚\n",
372
+ "β”‚ β”‚ scenery porn, snowy mountain peak, on back, sexy pose, looking at viewer, β”‚ β”‚\n",
373
+ "β”‚ β”‚ realistic, photo β”‚ β”‚\n",
374
+ "β”‚ <span style=\"color: #800000; text-decoration-color: #800000\">Negative</span> β”‚ low quality, worst quality, blurred background, blurry, simple background β”‚ 13 β”‚\n",
375
+ "β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜\n",
376
+ "</pre>\n"
377
+ ],
378
+ "text/plain": [
379
+ "┏━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┓\n",
380
+ "┃\u001b[1m \u001b[0m\u001b[1mPrompt Type\u001b[0m\u001b[1m \u001b[0m┃\u001b[1m \u001b[0m\u001b[1mPrompt \u001b[0m\u001b[1m \u001b[0m┃\u001b[1m \u001b[0m\u001b[1mToken Count\u001b[0m\u001b[1m \u001b[0m┃\n",
381
+ "┑━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━┩\n",
382
+ "β”‚ \u001b[32mPositive\u001b[0m β”‚ score_9, score_8_up, score_7_up, score_6_up, rating_safe, source_furry, portrait, β”‚ 65 β”‚\n",
383
+ "β”‚ β”‚ anthro male dragon, scale iridescence, detailed background, amazing_background, β”‚ β”‚\n",
384
+ "β”‚ β”‚ scenery porn, snowy mountain peak, on back, sexy pose, looking at viewer, β”‚ β”‚\n",
385
+ "β”‚ β”‚ realistic, photo β”‚ β”‚\n",
386
+ "β”‚ \u001b[31mNegative\u001b[0m β”‚ low quality, worst quality, blurred background, blurry, simple background β”‚ 13 β”‚\n",
387
+ "β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜\n"
388
+ ]
389
+ },
390
+ "metadata": {},
391
+ "output_type": "display_data"
392
+ },
393
+ {
394
+ "data": {
395
+ "text/html": [
396
+ "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\">┏━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┓\n",
397
+ "┃<span style=\"font-weight: bold\"> Prompt Type </span>┃<span style=\"font-weight: bold\"> Prompt </span>┃<span style=\"font-weight: bold\"> Token Count </span>┃\n",
398
+ "┑━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━┩\n",
399
+ "β”‚ <span style=\"color: #008000; text-decoration-color: #008000\">Positive</span> β”‚ score_9, score_8_up, score_7_up, score_6_up, rating_safe, source_furry, full-length β”‚ 71 β”‚\n",
400
+ "β”‚ β”‚ portrait, anthro female kobold, scalie, scale iridescence, detailed background, β”‚ β”‚\n",
401
+ "β”‚ β”‚ amazing_background, scenery porn, snowy mountain peak, on back, sexy pose, looking β”‚ β”‚\n",
402
+ "β”‚ β”‚ at viewer, realistic, photo β”‚ β”‚\n",
403
+ "β”‚ <span style=\"color: #800000; text-decoration-color: #800000\">Negative</span> β”‚ low quality, worst quality, blurred background, blurry, simple background β”‚ 13 β”‚\n",
404
+ "β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜\n",
405
+ "</pre>\n"
406
+ ],
407
+ "text/plain": [
408
+ "┏━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┓\n",
409
+ "┃\u001b[1m \u001b[0m\u001b[1mPrompt Type\u001b[0m\u001b[1m \u001b[0m┃\u001b[1m \u001b[0m\u001b[1mPrompt \u001b[0m\u001b[1m \u001b[0m┃\u001b[1m \u001b[0m\u001b[1mToken Count\u001b[0m\u001b[1m \u001b[0m┃\n",
410
+ "┑━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━┩\n",
411
+ "β”‚ \u001b[32mPositive\u001b[0m β”‚ score_9, score_8_up, score_7_up, score_6_up, rating_safe, source_furry, full-length β”‚ 71 β”‚\n",
412
+ "β”‚ β”‚ portrait, anthro female kobold, scalie, scale iridescence, detailed background, β”‚ β”‚\n",
413
+ "β”‚ β”‚ amazing_background, scenery porn, snowy mountain peak, on back, sexy pose, looking β”‚ β”‚\n",
414
+ "β”‚ β”‚ at viewer, realistic, photo β”‚ β”‚\n",
415
+ "β”‚ \u001b[31mNegative\u001b[0m β”‚ low quality, worst quality, blurred background, blurry, simple background β”‚ 13 β”‚\n",
416
+ "β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜\n"
417
+ ]
418
+ },
419
+ "metadata": {},
420
+ "output_type": "display_data"
421
+ },
422
+ {
423
+ "data": {
424
+ "text/html": [
425
+ "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"><span style=\"font-weight: bold\">Total number of prompts in realistic-sample-prompts.txt: </span><span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">8</span>\n",
426
  "</pre>\n"
427
  ],
428
  "text/plain": [
429
+ "\u001b[1mTotal number of prompts in realistic-sample-prompts.txt: \u001b[0m\u001b[1;36m8\u001b[0m\n"
430
  ]
431
  },
432
  "metadata": {},
 
439
  "from rich.console import Console\n",
440
  "from rich.table import Table\n",
441
  "\n",
 
442
  "def count_tokens(text):\n",
443
  " enc = tiktoken.get_encoding(\"cl100k_base\")\n",
444
  " tokens = enc.encode(text)\n",
dataset_tools/{Tag Counter.ipynb β†’ Tag Frequency.ipynb} RENAMED
@@ -4,7 +4,7 @@
4
  "cell_type": "markdown",
5
  "metadata": {},
6
  "source": [
7
- "## Tag Counter\n",
8
  "----\n",
9
  "\n",
10
  "This Python script extracts tags from `.txt` files within a specified directory and its subdirectories. It then counts the frequency of each tag and lists them in descending order of frequency."
 
4
  "cell_type": "markdown",
5
  "metadata": {},
6
  "source": [
7
+ "## Tag Frequency\n",
8
  "----\n",
9
  "\n",
10
  "This Python script extracts tags from `.txt` files within a specified directory and its subdirectories. It then counts the frequency of each tag and lists them in descending order of frequency."
static/hoodwink/00000128-03212113-512.png ADDED

Git LFS Details

  • SHA256: fb6cd4feac83aae831d54fd545fc5cd4a0c44d0021995255512611b314719b43
  • Pointer size: 131 Bytes
  • Size of remote file: 241 kB
static/hoodwink/00000128-03212113.png ADDED

Git LFS Details

  • SHA256: 79c4a9bc8f608e0172ef1c79713992b63ee8e902248a25c968a84f4e664160c6
  • Pointer size: 131 Bytes
  • Size of remote file: 746 kB