--- license: wtfpl --- # AI Image Processing Toolkit --- A collection of specialized scripts for AI image processing, dataset preparation, and model training workflows. ## 🛠️ Scripts Overview --- ### `wdv3` An image tagging script using the WD V3 tagger models by [SmilingWolf](https://huggingface.co/SmilingWolf) based on [this repo](https://github.com/neggles/wdv3-timm). Supports multiple model architectures (ViT, SwinV2, ConvNext) and can process both single images and directories recursively. #### Features - Multiple model architecture support - Batch processing capabilities - Adjustable confidence thresholds - CUDA acceleration with FP16 support - JXL image format support ### `train_functions` A set of ZSH functions for managing AI model training workflows: - Script execution management - Training variable setup - Git repository state tracking - Output directory management - Automatic cleanup of empty outputs ### `git-wrapper` Enhanced Git functionality for dataset management: - Automatic submodule handling - LFS integration for JXL files - Dataset-specific Git attributes management ### `check4sig` Dataset caption file watermark detection utility: - Scans .caption files for watermark-related text - Batch processing support - Interactive editing with nvim - Recursive directory scanning ### `gallery-dl` Directory-aware wrapper for gallery-dl: - Automatically changes to ~/datasets directory - Maintains consistent download locations - Preserves original command functionality ### `joy` Advanced image captioning system by [fancyfeast](https://huggingface.co/fancyfeast) called [JoyCaption](https://huggingface.co/spaces/fancyfeast/joy-caption-alpha-two/tree/main) using CLIP and LLM - Multiple caption styles (descriptive, training prompts, art critic, etc.) - Custom image adapters - Tag-based caption generation - Batch processing support ### `png2mp4` Training progress visualization tool: - Converts PNG sequences to MP4 - Customizable frame rates and durations - Step counter overlay support - Multiple sample handling ### `xyplot` Image comparison grid generator: - Supports multiple image formats - Customizable grid layouts - Optional row/column labels - Automatic image padding and alignment ### `concat_captions` Utility for combining multiple caption files: - Merges .caption and .tags files - Maintains original image associations - Batch processing support - Error handling for missing files ### `stats` Directory analysis and statistics generation tool that provides detailed file counts and metrics: - Detailed file counting by extension with color-coded output for different file types (JXL, PNG, JPG, etc.) - Multiple sorting options (by name, count, or specific file types) - Recursive directory scanning with aggregated statistics - Color-coded thresholds for dataset size evaluation - Automatic categorization of files into image and text groups - Grand total calculations across all subdirectories ### `shortcode` Hugo-compatible shortcode generator for image galleries with blurhash integration: - Generates Hugo-compatible shortcode blocks for each image - Integrates blurhash codes for progressive image loading - Automatically extracts and includes image dimensions - Preserves and integrates image captions from metadata - Supports grid layout configurations - Processes directories recursively while maintaining structure - Handles relative path resolution for static content ### `yiffdata` Comprehensive image metadata extraction and JSON generation utility: - Extracts precise image dimensions using PIL - Combines existing blurhash codes from .bh files - Integrates caption data from .caption files - Generates consolidated JSON output with all metadata - Maintains original filename references - Supports batch processing of entire directories - Preserves file relationships and metadata hierarchy ### `txt2tags` Batch file extension conversion utility for dataset management: - Converts .txt files to .tags format for ML training compatibility - Preserves original file content and structure - Supports recursive directory traversal - Interactive mode for selective conversion - Maintains original file timestamps and permissions - Simple command-line interface with directory input ### `txt2emoji` Advanced text-to-emoji conversion system with context awareness: - Sophisticated word-to-emoji mapping with custom dictionaries - Context-aware emoji selection to avoid redundancy - Detailed conversion explanations with rationale - Batch processing with multiple output formats - Configurable threshold and filtering options - NLTK integration for improved text parsing - Extensive customization options for emoji mappings ### `jtp2` State-of-the-art image classification system using [Redrocket](https://huggingface.co/RedRocket)'s [PILOT2](https://huggingface.co/RedRocket/JointTaggerProject/tree/main/JTP_PILOT2) model: - Implements Vision Transformer architecture with custom modifications - Features GatedHead classifier for improved accuracy - CUDA-accelerated inference with FP16 support - Configurable confidence thresholds for tag generation - Comprehensive batch processing capabilities - Automatic tag file generation alongside images - Supports multiple image formats including JXL ### `keyframe` Efficient video keyframe extraction tool using FFmpeg: - Extracts high-quality keyframes from video files - Creates organized output directories automatically - Maintains original frame quality and metadata - Intelligent I-frame detection and extraction - Sequential frame naming with padding - Minimal quality loss during extraction - Simple command-line interface ### `chop_blocks` Advanced LoRA model manipulation tool for fine-grained control using code from [resize-lora](https://github.com/elias-gaeros/resize_lora) by [Gaeros](https://github.com/elias-gaeros): - Precise block-level filtering of LoRA models - Sophisticated weight adjustment capabilities - Full SafeTensors format support - Detailed analysis and reporting of model structure - Preserves model metadata during modifications - Vector string format for block manipulation - Supports both SDXL and SD1 naming conventions ## 🚀 Installation --- 1. Clone the repository: (optional) ```bash git clone https://huggingface.co/k4d3/toolkit ``` 2. Add the repository to your PATH: (optional) ```bash export PATH="$PATH:~/path/to/toolkit" ``` 3. Add the `.zshrc` to your shell: (optional and you will need to make changes to it) ```bash source ~/path/to/toolkit/.zshrc nano ~/.zshrc ``` ## 📝 Requirements --- - miniconda with the environment set up for training with sd-scripts, inferring with timm, llama, etc - ZSH shell (optional) - CUDA-capable GPU (recommended) - Required Python packages: - torch - transformers - pillow - pillow-jxl - opencv-python - numpy - and a lot more ## 🔧 Usage --- Each script can be used independently or as part of a workflow. Here are some usage examples: ### XY Plot ```bash xyplot ./ComfyUI_00341_.png ./ComfyUI_00342_.png ./ComfyUI_00346_.png --column-labels "No LoRA" "minit-v1s6000.safetensors M:1.0 TE:1.0" "minit-v1s6000.safetensors M:1.40 TE:1.0" --rows 1 --output plot1.png ``` ### JoyCaption ```bash joy --feed-from-tags=10 --custom_prompt="Write a very long descriptive caption for this image in a formal tone. Do not mention feelings and emotions evoked by the image." . ``` ### png2mp4 ```bash png2mp4 --repeat 16 ``` ### inject_to_txt ```bash inject_to_txt 1_honovy "honovy" ``` ### replace_comma_with_keep_tags_txt ```bash replace_comma_with_keep_tags_txt 1 1_honovy ``` ## 📦 Directory Structure --- ```bash ~/ ├── datasets/ ├── output_dir/ ├── models/ ├── toolkit/ ``` ## 📄 License --- [WTFPL](http://www.wtfpl.net/) - Do what the fuck you want with it. The included data and models are copyrighted by their respective owners with their own licenses. ## 🤝 Contributing --- Contributions are welcome! For major changes, please open an issue first to discuss what you would like to change. ## 📚 Documentation --- If the documentation of a script is missing, ask a language model about it.