Aloukik21
/

trainer

@@ -1,6 +1,17 @@
 # AI Trainer - RunPod Serverless
-Multi-model LoRA training service using [ai-toolkit](https://github.com/ostris/ai-toolkit).
 ## Supported Models
@@ -17,23 +28,68 @@ Multi-model LoRA training service using [ai-toolkit](https://github.com/ostris/a
 ## API Usage
 ### List Models
 ### Check Status
 ### Train LoRA
 ## RunPod Deployment
 ### Environment Variables
-- HF_TOKEN: HuggingFace token for gated models
 ### Model Caching
-For faster cold starts, set one of these HuggingFace repos in the RunPod "Model" field:
-- FLUX: black-forest-labs/FLUX.1-dev
-- Wan 2.2: ai-toolkit/Wan2.2-T2V-A14B-Diffusers-bf16
-- Qwen: Qwen/Qwen-Image
-Note: RunPod only supports ONE cached model per endpoint.

+---
+license: mit
+tags:
+- lora
+- training
+- runpod
+- ai-toolkit
+---
 # AI Trainer - RunPod Serverless
+Single-endpoint multi-model LoRA training service using [ai-toolkit](https://github.com/ostris/ai-toolkit).
+Automatically cleans up GPU memory when switching between different models.
 ## Supported Models
 ## API Usage
 ### List Models
+```json
+{"input": {"action": "list_models"}}
+```
 ### Check Status
+```json
+{"input": {"action": "status"}}
+```
+### Manual Cleanup
+```json
+{"input": {"action": "cleanup"}}
+```
 ### Train LoRA
+```json
+{
+  "input": {
+    "action": "train",
+    "model": "flux_dev",
+    "params": {
+      "dataset_path": "/workspace/dataset",
+      "output_path": "/workspace/output",
+      "steps": 1000,
+      "batch_size": 1,
+      "learning_rate": 1e-4,
+      "lora_rank": 16
+    }
+  }
+}
+```
+## Training Parameters
+| Parameter | Description | Default |
+|-----------|-------------|---------|
+| dataset_path | Path to training images | /workspace/dataset |
+| output_path | Output directory | /workspace/output |
+| steps | Training steps | 2000 |
+| batch_size | Batch size | 1 |
+| learning_rate | Learning rate | 1e-4 |
+| lora_rank | LoRA rank | 16-32 |
+| save_every | Save checkpoint interval | 250 |
+| sample_every | Sample generation interval | 250 |
+| trigger_word | Trigger word for training | None |
 ## RunPod Deployment
 ### Environment Variables
+- `HF_TOKEN`: HuggingFace token for gated models (required for FLUX, Qwen)
 ### Model Caching
+Models are cached at `/runpod-volume/huggingface-cache/hub/` for faster subsequent loads.
+For optimal cold starts, set the RunPod **Model** field to one of:
+- `black-forest-labs/FLUX.1-dev` (for FLUX training)
+- `ai-toolkit/Wan2.2-T2V-A14B-Diffusers-bf16` (for Wan 2.2)
+- `Qwen/Qwen-Image` (for Qwen Image)
+## Auto-Cleanup
+The handler automatically cleans up GPU memory when switching between models:
+- Full cleanup when changing model types
+- Light cleanup for same model
+- Manual cleanup via `cleanup` action