JoyAI-Echo — GGUF (for low-VRAM ComfyUI)

Quantized GGUF weights and extracted components for running JD's JoyAI-Echo — a text → video + audio model — on consumer GPUs.

The original release is a single ~46 GB bf16 checkpoint built for ~48 GB-class hardware. These files let it run on 8 GB VRAM + 16 GB RAM through the companion ComfyUI nodes.

▶ Nodes / how to run: https://github.com/RealRebelAI/ComfyUI_JoyAI_Echo_GGUF_Nodes

⚠️ Experimental. These run, but on small hardware loads and the first encode are slow. See the node repo for expectations and troubleshooting.

What's in this repo

File	What it is	Notes
`JoyAI-Echo-DiT-Q2_K.gguf`	Diffusion transformer, Q2_K	Smallest / lowest RAM. Start here on 8 GB.
`JoyAI-Echo-DiT-Q4_K_M.gguf`	Diffusion transformer, Q4_K_M	Higher quality, more RAM.
`joyai_echo_video_vae.safetensors`	Video VAE	Required.
`joyai_echo_audio_vae.safetensors`	Audio VAE	Required.
`joyai_echo_vocoder.safetensors`	Vocoder	Required for audio.
`joyai_echo_embeddings_processor.safetensors`	Text-embedding connector	Required.
`joyai_echo_config.json`	Architecture config	Also ships inside the node pack.

(Sizes are approximate; check the file listing. Q2_K is the lighter option, Q4_K_M the heavier/higher-quality one.)

Not included: the text encoder

JoyAI-Echo uses Gemma-3-12B as its text encoder. It isn't rehosted here — download gemma_3_12B_it_fp8_scaled.safetensors (Comfy-Org from google/gemma-3-12b-it.

Why GGUF (and why these specific files)

Stock ComfyUI GGUF loaders build the wrong architecture for JoyAI-Echo — it's a modified LTX-2.3 (different scale_shift_table and connector dimensions). The companion nodes rebuild it correctly with JoyAI's own configurator and keep the DiT weights packed in GGUF form, dequantizing on the fly so the model fits in limited RAM/VRAM.

The VAEs, vocoder, and connector are extracted from the original checkpoint as standalone files so you don't need the 46 GB bundle at runtime.

Usage

Install the ComfyUI nodes.
Download the files above and place them in the ComfyUI folders listed in the node README (models/unet, models/vae, models/audio_encoders, models/text_encoders).
Add the RebelsJE_StagedPipeline node → CreateVideo → SaveVideo, pick the files from the dropdowns, prompt, and queue.

Pick Q2_K first if you're on 8 GB; move up to Q4_K_M if you have RAM headroom and want better quality.

Hardware

Minimum target: 8 GB VRAM + 16 GB RAM (RTX 3070 class).
SSD strongly recommended — the files load much faster than from a spinning disk.

Acknowledgements

JoyAI-Echo — Echo Team @ Joy Future Academy, JD
LTX-2.3 — Lightricks
Gemma-3 — Google
GGUF tooling: city96/ComfyUI-GGUF

License

These quantized weights are derived from JoyAI-Echo and are released for academic research and non-commercial use only, following the upstream JoyAI-Echo license. Gemma components (downloaded separately) are governed by Google's Gemma license. By downloading you agree to those terms.

Downloads last month: 27

GGUF

Model size

22B params

Architecture

ltxv

Hardware compatibility

2-bit

4-bit

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support