Commit History

Update README.md
3469c86
verified

Nekochu commited on

Remove onnxruntime
2fda2eb
verified

Nekochu commited on

Revert to PyTorch INT8 (ONNX export produces NaN)
563ef1b
verified

Nekochu commited on

Reduce ORT memory: disable prepacking, basic optimization, 1 thread
10d0786
verified

Nekochu commited on

Fix: unsqueeze prompt embeddings
1620b49
verified

Nekochu commited on

Add onnxruntime dependency
ac9fb33
verified

Nekochu commited on

Switch to ONNX INT8 DiT
6ee4bac
verified

Nekochu commited on

fixes: cudagc guard, rm conditioner.py, turbo depth colormap, proper normal viz, compact UI, example
d65d5b5

Nekochu commited on

fix: .weight.dtype crashes on INT8 quantized Linear, use .float()
2aca4a6

Nekochu commited on

fix: use FP32 dtype on CPU (no bfloat16 autocast with INT8 model)
09dfa8a

Nekochu commited on

fix: load full torch.save model directly (no FP32 construction, mmap)
9d04602

Nekochu commited on

fix OOM: use mmap loading + assign (no memory copy)
b57f6e2

Nekochu commited on

fix: eager model load at startup (lazy load causes SSE timeout)
19a7c6b

Nekochu commited on

fix: move demo to module level (Gradio SDK needs it, not inside main())
284342e

Nekochu commited on

add example images for lazy-loaded examples
214c569

Nekochu commited on

switch to Gradio SDK (no Docker needed for pure Python)
2c1c6de

Nekochu commited on

fix: PYTHONUNBUFFERED for Docker log visibility
a73932a

Nekochu commited on

use pre-quantized INT8 model (no FP8 casting, no LoRA merge at runtime)
a3ba5ed

Nekochu commited on

fix: diffusers dep, layer-by-layer FP8->FP32 cast, LoRA merge in FP32, INT8 quant
551acb3

Nekochu commited on

FE2E depth+normal CPU Space: FP8 dynamic INT8, single denoise
405d2b1

Nekochu commited on