---
license: wtfpl
---

<!-- markdownlint-disable MD029 -->

# AI Image Processing Toolkit

---

A collection of specialized scripts for AI image processing, dataset preparation, and model training workflows.

## 🛠️ Scripts Overview

---

### `wdv3`

An image tagging script using the WD V3 tagger models by [SmilingWolf](https://huggingface.co/SmilingWolf) based on [this repo](https://github.com/neggles/wdv3-timm). Supports multiple model architectures (ViT, SwinV2, ConvNext) and can process both single images and directories recursively.

#### Features

- Multiple model architecture support
- Batch processing capabilities
- Adjustable confidence thresholds
- CUDA acceleration with FP16 support
- JXL image format support

### `train_functions`

A set of ZSH functions for managing AI model training workflows:

- Script execution management
- Training variable setup
- Git repository state tracking
- Output directory management
- Automatic cleanup of empty outputs

### `git-wrapper`

Enhanced Git functionality for dataset management:

- Automatic submodule handling
- LFS integration for JXL files
- Dataset-specific Git attributes management

### `check4sig`

Dataset caption file watermark detection utility:

- Scans .caption files for watermark-related text
- Batch processing support
- Interactive editing with nvim
- Recursive directory scanning

### `gallery-dl`

Directory-aware wrapper for gallery-dl:

- Automatically changes to ~/datasets directory
- Maintains consistent download locations
- Preserves original command functionality

### `joy`

Advanced image captioning system by [fancyfeast](https://huggingface.co/fancyfeast) called [JoyCaption](https://huggingface.co/spaces/fancyfeast/joy-caption-alpha-two/tree/main) using CLIP and LLM

- Multiple caption styles (descriptive, training prompts, art critic, etc.)
- Custom image adapters
- Tag-based caption generation
- Batch processing support

### `png2mp4`

Training progress visualization tool:

- Converts PNG sequences to MP4
- Customizable frame rates and durations
- Step counter overlay support
- Multiple sample handling

### `xyplot`

Image comparison grid generator:

- Supports multiple image formats
- Customizable grid layouts
- Optional row/column labels
- Automatic image padding and alignment

### `concat_captions`

Utility for combining multiple caption files:

- Merges .caption and .tags files
- Maintains original image associations
- Batch processing support
- Error handling for missing files

### `stats`

Directory analysis and statistics generation tool that provides detailed file counts and metrics:

- Detailed file counting by extension with color-coded output for different file types (JXL, PNG, JPG, etc.)
- Multiple sorting options (by name, count, or specific file types)
- Recursive directory scanning with aggregated statistics
- Color-coded thresholds for dataset size evaluation
- Automatic categorization of files into image and text groups
- Grand total calculations across all subdirectories

### `shortcode`

Hugo-compatible shortcode generator for image galleries with blurhash integration:

- Generates Hugo-compatible shortcode blocks for each image
- Integrates blurhash codes for progressive image loading
- Automatically extracts and includes image dimensions
- Preserves and integrates image captions from metadata
- Supports grid layout configurations
- Processes directories recursively while maintaining structure
- Handles relative path resolution for static content

### `yiffdata`

Comprehensive image metadata extraction and JSON generation utility:

- Extracts precise image dimensions using PIL
- Combines existing blurhash codes from .bh files
- Integrates caption data from .caption files
- Generates consolidated JSON output with all metadata
- Maintains original filename references
- Supports batch processing of entire directories
- Preserves file relationships and metadata hierarchy

### `txt2tags`

Batch file extension conversion utility for dataset management:

- Converts .txt files to .tags format for ML training compatibility
- Preserves original file content and structure
- Supports recursive directory traversal
- Interactive mode for selective conversion
- Maintains original file timestamps and permissions
- Simple command-line interface with directory input

### `txt2emoji`

Advanced text-to-emoji conversion system with context awareness:

- Sophisticated word-to-emoji mapping with custom dictionaries
- Context-aware emoji selection to avoid redundancy
- Detailed conversion explanations with rationale
- Batch processing with multiple output formats
- Configurable threshold and filtering options
- NLTK integration for improved text parsing
- Extensive customization options for emoji mappings

### `jtp2`

State-of-the-art image classification system using [Redrocket](https://huggingface.co/RedRocket)'s [PILOT2](https://huggingface.co/RedRocket/JointTaggerProject/tree/main/JTP_PILOT2) model:

- Implements Vision Transformer architecture with custom modifications
- Features GatedHead classifier for improved accuracy
- CUDA-accelerated inference with FP16 support
- Configurable confidence thresholds for tag generation
- Comprehensive batch processing capabilities
- Automatic tag file generation alongside images
- Supports multiple image formats including JXL

### `keyframe`

Efficient video keyframe extraction tool using FFmpeg:

- Extracts high-quality keyframes from video files
- Creates organized output directories automatically
- Maintains original frame quality and metadata
- Intelligent I-frame detection and extraction
- Sequential frame naming with padding
- Minimal quality loss during extraction
- Simple command-line interface

### `chop_blocks`

Advanced LoRA model manipulation tool for fine-grained control using code from [resize-lora](https://github.com/elias-gaeros/resize_lora) by [Gaeros](https://github.com/elias-gaeros):

- Precise block-level filtering of LoRA models
- Sophisticated weight adjustment capabilities
- Full SafeTensors format support
- Detailed analysis and reporting of model structure
- Preserves model metadata during modifications
- Vector string format for block manipulation
- Supports both SDXL and SD1 naming conventions

<!-- ⚠️ TODO: add more scripts -->

## 🚀 Installation

---

1. Clone the repository: (optional)

```bash
git clone https://huggingface.co/k4d3/toolkit
```

2. Add the repository to your PATH: (optional)

```bash
export PATH="$PATH:~/path/to/toolkit"
```

3. Add the `.zshrc` to your shell: (optional and you will need to make changes to it)

```bash
source ~/path/to/toolkit/.zshrc
nano ~/.zshrc
```

## 📝 Requirements

---

- miniconda with the environment set up for training with sd-scripts, inferring with timm, llama, etc
- ZSH shell (optional)
- CUDA-capable GPU (recommended)
- Required Python packages:
  - torch
  - transformers
  - pillow
  - pillow-jxl
  - opencv-python
  - numpy
  - and a lot more

## 🔧 Usage

---

Each script can be used independently or as part of a workflow. Here are some usage examples:

<!-- ⚠️ TODO: add more usage examples -->

### XY Plot

```bash
xyplot ./ComfyUI_00341_.png ./ComfyUI_00342_.png ./ComfyUI_00346_.png --column-labels "No LoRA" "minit-v1s6000.safetensors M:1.0 TE:1.0" "minit-v1s6000.safetensors M:1.40 TE:1.0" --rows 1 --output plot1.png
```

### JoyCaption

```bash
joy --feed-from-tags=10 --custom_prompt="Write a very long descriptive caption for this image in a formal tone. Do not mention feelings and emotions evoked by the image." .
```

### png2mp4

```bash
png2mp4 --repeat 16
```

### inject_to_txt

```bash
inject_to_txt 1_honovy "honovy"
```

### replace_comma_with_keep_tags_txt

```bash
replace_comma_with_keep_tags_txt 1 1_honovy
```

## 📦 Directory Structure

---

```bash
~/
├── datasets/
├── output_dir/
├── models/
├── toolkit/
```

## 📄 License

---

[WTFPL](http://www.wtfpl.net/) - Do what the fuck you want with it.

The included data and models are copyrighted by their respective owners with their own licenses.

## 🤝 Contributing

---

Contributions are welcome! For major changes, please open an issue first to discuss what you would like to change.

## 📚 Documentation

---

If the documentation of a script is missing, ask a language model about it.