JoyAI-Echo β GGUF (for low-VRAM ComfyUI)
Quantized GGUF weights and extracted components for running JD's JoyAI-Echo β a text β video + audio model β on consumer GPUs.
The original release is a single ~46 GB bf16 checkpoint built for ~48 GB-class hardware. These files let it run on 8 GB VRAM + 16 GB RAM through the companion ComfyUI nodes.
βΆ Nodes / how to run: https://github.com/RealRebelAI/ComfyUI_JoyAI_Echo_GGUF_Nodes
β οΈ Experimental. These run, but on small hardware loads and the first encode are slow. See the node repo for expectations and troubleshooting.
What's in this repo
| File | What it is | Notes |
|---|---|---|
JoyAI-Echo-DiT-Q2_K.gguf |
Diffusion transformer, Q2_K | Smallest / lowest RAM. Start here on 8 GB. |
JoyAI-Echo-DiT-Q4_K_M.gguf |
Diffusion transformer, Q4_K_M | Higher quality, more RAM. |
joyai_echo_video_vae.safetensors |
Video VAE | Required. |
joyai_echo_audio_vae.safetensors |
Audio VAE | Required. |
joyai_echo_vocoder.safetensors |
Vocoder | Required for audio. |
joyai_echo_embeddings_processor.safetensors |
Text-embedding connector | Required. |
joyai_echo_config.json |
Architecture config | Also ships inside the node pack. |
(Sizes are approximate; check the file listing. Q2_K is the lighter option, Q4_K_M the heavier/higher-quality one.)
Not included: the text encoder
JoyAI-Echo uses Gemma-3-12B as its text encoder. It isn't rehosted here β download gemma_3_12B_it_fp8_scaled.safetensors (Comfy-Org from google/gemma-3-12b-it.
Why GGUF (and why these specific files)
Stock ComfyUI GGUF loaders build the wrong architecture for JoyAI-Echo β it's a modified LTX-2.3 (different scale_shift_table and connector dimensions). The companion nodes rebuild it correctly with JoyAI's own configurator and keep the DiT weights packed in GGUF form, dequantizing on the fly so the model fits in limited RAM/VRAM.
The VAEs, vocoder, and connector are extracted from the original checkpoint as standalone files so you don't need the 46 GB bundle at runtime.
Usage
- Install the ComfyUI nodes.
- Download the files above and place them in the ComfyUI folders listed in the node README (
models/unet,models/vae,models/audio_encoders,models/text_encoders). - Add the RebelsJE_StagedPipeline node β
CreateVideoβSaveVideo, pick the files from the dropdowns, prompt, and queue.
Pick Q2_K first if you're on 8 GB; move up to Q4_K_M if you have RAM headroom and want better quality.
Hardware
- Minimum target: 8 GB VRAM + 16 GB RAM (RTX 3070 class).
- SSD strongly recommended β the files load much faster than from a spinning disk.
Acknowledgements
- JoyAI-Echo β Echo Team @ Joy Future Academy, JD
- LTX-2.3 β Lightricks
- Gemma-3 β Google
- GGUF tooling: city96/ComfyUI-GGUF
License
These quantized weights are derived from JoyAI-Echo and are released for academic research and non-commercial use only, following the upstream JoyAI-Echo license. Gemma components (downloaded separately) are governed by Google's Gemma license. By downloading you agree to those terms.
- Downloads last month
- 27
2-bit
4-bit
