k4d3 commited on
Commit
9f280b0
1 Parent(s): 7034452

Signed-off-by: Balazs Horvath <acsipont@gmail.com>

Files changed (1) hide show
  1. README.md +11 -6
README.md CHANGED
@@ -146,19 +146,23 @@ Don't be afraid of editing Python scripts, unlike the real snake, these won't bi
146
 
147
  ## Dataset Preparation
148
 
149
- Before you begin collecting your dataset you will need to decide what you want to teach the model, it can be a character, a style or a new concept.
150
 
151
- For now let's imagine you want to teach your model *wicerbeasts* so you can generate your VRChat avatar every night. For this we'll make good use of the furry <abbr title="image board">booru</abbr> [e621.net](https://e621.net/). There are two nice ways to download data from this site with the metadata intact, I'll start with the fastest and then I will explain how you can selectively browse around the site and get the images you like one by one.
152
 
153
  ### Create the `training_dir` Directory
154
 
155
- Before starting we need a directory where we'll organize our datasets. Open up a terminal by pressing `Win + R` and typing in `pwsh`.
156
 
157
  ```pwsh
158
- Set-Location C:\
159
-
 
 
160
  ```
161
 
 
 
162
  ### Grabber
163
 
164
  [Grabber](https://github.com/Bionus/imgbrd-grabber) makes your life easier when trying to compile datasets quickly from imageboards.
@@ -173,7 +177,8 @@ You should also enable `Separate log files` for e621, this will download the met
173
 
174
  For Pony I've set up the Text file content like so: `rating_%rating%, %all:separator=^, %` for other models you might want to replace `rating_%rating%` with just `%rating%`.
175
 
176
- You should also set the `Folder` into which the images will get downloaded. Let's try to use
 
177
  Now you are ready to right-click on each group and download the images.
178
 
179
  ---
 
146
 
147
  ## Dataset Preparation
148
 
149
+ Before you begin collecting your dataset you will need to decide what you want to teach the model, it can be a character, a style or a new concept.
150
 
151
+ For now let's imagine you want to teach your model *wickerbeasts* so you can generate your VRChat avatar every night.
152
 
153
  ### Create the `training_dir` Directory
154
 
155
+ Before starting we need a directory where we'll organize our datasets. Open up a terminal by pressing `Win + R` and typing in `pwsh`. We will also be using [git](https://git-scm.com/download/win) and [huggingface](https://huggingface.co/) to version control our smut. For brevity I'll refrain from giving you a tutorial on both. Once you have your newly created dataset on HF ready lets clone it. Make sure you change `user` in the first line to your HF username!
156
 
157
  ```pwsh
158
+ git clone git@hf.co:/datasets/user/training_dir C:\training_dir
159
+ Set-Location C:\training_dir
160
+ git branch wickerbeast
161
+ git checkout wickerbeast
162
  ```
163
 
164
+ Let's continue with downloading some *wickerbeast* data but don't close the terminal window just yet, for this we'll make good use of the furry <abbr title="image board">booru</abbr> [e621.net](https://e621.net/). There are two nice ways to download data from this site with the metadata intact, I'll start with the fastest and then I will explain how you can selectively browse around the site and get the images you like one by one.
165
+
166
  ### Grabber
167
 
168
  [Grabber](https://github.com/Bionus/imgbrd-grabber) makes your life easier when trying to compile datasets quickly from imageboards.
 
177
 
178
  For Pony I've set up the Text file content like so: `rating_%rating%, %all:separator=^, %` for other models you might want to replace `rating_%rating%` with just `%rating%`.
179
 
180
+ You should also set the `Folder` into which the images will get downloaded. Let's use `C:\training_dir\1_wickerbeast` for both groups.
181
+
182
  Now you are ready to right-click on each group and download the images.
183
 
184
  ---