JoyAI-Echo β€” GGUF (for low-VRAM ComfyUI)

Quantized GGUF weights and extracted components for running JD's JoyAI-Echo β€” a text β†’ video + audio model β€” on consumer GPUs.

The original release is a single ~46 GB bf16 checkpoint built for ~48 GB-class hardware. These files let it run on 8 GB VRAM + 16 GB RAM through the companion ComfyUI nodes.

β–Ά Nodes / how to run: https://github.com/RealRebelAI/ComfyUI_JoyAI_Echo_GGUF_Nodes

⚠️ Experimental. These run, but on small hardware loads and the first encode are slow. See the node repo for expectations and troubleshooting.


What's in this repo

File What it is Notes
JoyAI-Echo-DiT-Q2_K.gguf Diffusion transformer, Q2_K Smallest / lowest RAM. Start here on 8 GB.
JoyAI-Echo-DiT-Q4_K_M.gguf Diffusion transformer, Q4_K_M Higher quality, more RAM.
joyai_echo_video_vae.safetensors Video VAE Required.
joyai_echo_audio_vae.safetensors Audio VAE Required.
joyai_echo_vocoder.safetensors Vocoder Required for audio.
joyai_echo_embeddings_processor.safetensors Text-embedding connector Required.
joyai_echo_config.json Architecture config Also ships inside the node pack.

(Sizes are approximate; check the file listing. Q2_K is the lighter option, Q4_K_M the heavier/higher-quality one.)

Not included: the text encoder

JoyAI-Echo uses Gemma-3-12B as its text encoder. It isn't rehosted here β€” download gemma_3_12B_it_fp8_scaled.safetensors (Comfy-Org from google/gemma-3-12b-it.


Why GGUF (and why these specific files)

Stock ComfyUI GGUF loaders build the wrong architecture for JoyAI-Echo β€” it's a modified LTX-2.3 (different scale_shift_table and connector dimensions). The companion nodes rebuild it correctly with JoyAI's own configurator and keep the DiT weights packed in GGUF form, dequantizing on the fly so the model fits in limited RAM/VRAM.

The VAEs, vocoder, and connector are extracted from the original checkpoint as standalone files so you don't need the 46 GB bundle at runtime.


Usage

  1. Install the ComfyUI nodes.
  2. Download the files above and place them in the ComfyUI folders listed in the node README (models/unet, models/vae, models/audio_encoders, models/text_encoders).
  3. Add the RebelsJE_StagedPipeline node β†’ CreateVideo β†’ SaveVideo, pick the files from the dropdowns, prompt, and queue.

Pick Q2_K first if you're on 8 GB; move up to Q4_K_M if you have RAM headroom and want better quality.


Hardware

  • Minimum target: 8 GB VRAM + 16 GB RAM (RTX 3070 class).
  • SSD strongly recommended β€” the files load much faster than from a spinning disk.

Acknowledgements


License

These quantized weights are derived from JoyAI-Echo and are released for academic research and non-commercial use only, following the upstream JoyAI-Echo license. Gemma components (downloaded separately) are governed by Google's Gemma license. By downloading you agree to those terms.

Screenshot (156)

Downloads last month
27
GGUF
Model size
22B params
Architecture
ltxv
Hardware compatibility
Log In to add your hardware

2-bit

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support