tencent/HunyuanImage-3.0 · Can it be converted to GGUF's? What about comfyUI support? thanks. Looking good but not sure if its for a single GPU consumer....

AgustinCaniglia

about 21 hours ago

title

wsbagnsv1

about 21 hours ago

This, with gguf it might run on high end system (;

CCP6

about 21 hours ago

It's been out for 25min or so........I mean.......doubt more than a few have downloaded the full model yet.

silverwolf9008

about 21 hours ago

Seems very unprudish thanks for that. Keep up the good work.

allen666

about 20 hours ago

just have a look，bye～

thaddeusk

about 16 hours ago

•

edited about 16 hours ago

This, with gguf it might run on high end system (;

Yeah, a 4-bit model would fit snuggly in my Ryzen AI Max+ 395, but I imagine it'll be pretty slow. They are planning distilled versions, though.

kabachuha

about 14 hours ago

It's a MoE, so it will be far faster to launch that if it would be a dense 70B (70B models 4bit fit into two 24gb GPUs, by the way). If you don't have enough memory, you can use dynamic CPU offload for some experts, speeding up the generation significantly. After all, people launch GPT-OSS 120B and Qwen 80bs on their consumer hardware just fine (quantized, of course)

wsbagnsv1

about 9 hours ago

It's a MoE, so it will be far faster to launch that if it would be a dense 70B (70B models 4bit fit into two 24gb GPUs, by the way). If you don't have enough memory, you can use dynamic CPU offload for some experts, speeding up the generation significantly. After all, people launch GPT-OSS 120B and Qwen 80bs on their consumer hardware just fine (quantized, of course)

This, you need 64 gb ram probably, but this model should be able to run at a decent speed on somewhat good gpus. You can fit the active parameters in around 12gb vram with some 8k context for q4 llms so it might actually run with somewhat acceptable speeds on 12gb vram +, though ofc the more vram the better (; (only if it is optimized correctly though)

ubergarm

about 7 hours ago

It looks like a similar 80B-A13B MoE architecture as the earlier text LLM which was added to llama.cpp a while back: https://github.com/ggml-org/llama.cpp/pull/14425 (i worked on the ik_llama.cpp port, and the text model had some really strange issues with high perplexity likely due to the MoE router implementation being "unique")...

With only 13B active, a somewhat quantized version should be easy enough to run on hybrid CPU+GPU hopefully yeah...

ubergarm

about 6 hours ago

•

edited about 5 hours ago

Unfortunately, not so simple as changing a few lines in llama.cpp's convert_hf_to_gguf.py as this model has different named tensors:

$ numactl -N 1 -m 1 \
python \
    convert_hf_to_gguf.py \
    --outtype bf16 \
    --split-max-size 50G \
    --outfile /mnt/data/models/ubergarm/HunyuanImage-3.0-GGUF \
    /mnt/data/models/tencent/HunyuanImage-3/
INFO:hf-to-gguf:Loading model: HunyuanImage-3
WARNING:hf-to-gguf:Trying to load config.json instead
INFO:gguf.gguf_writer:gguf: This GGUF file is for Little Endian only
INFO:hf-to-gguf:Exporting model...
INFO:hf-to-gguf:gguf: loading model weight map from 'model.safetensors.index.json'
INFO:hf-to-gguf:gguf: loading model part 'model-0001-of-0032.safetensors'
Traceback (most recent call last):
  File "/home/w/projects/llama.cpp/convert_hf_to_gguf.py", line 8466, in modify_tensors
    return [(self.map_tensor_name(name), data_torch)]
             ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/w/projects/llama.cpp/convert_hf_to_gguf.py", line 259, in map_tensor_name
    raise ValueError(f"Can not map tensor {name!r}")
ValueError: Can not map tensor 'final_layer.model.0.emb_layers.1.bias'

Plan B for me is to try get the demo running on a CPU only big AMD EPYC Rig (thanks Wendell of level1techs for the hardware!!) with triton-cpu and search replacing "cuda" to "cpu" lol... looks like 2 hours 45 minutes to generate a single 1024x image so far...

4%|█████ | 2/50 [06:20<2:31:38, 189.55s/it]

This is probably much easier if you have ~180GB VRAM or so 😅

Here is my procedure:

# modified from https://huggingface.co/tencent/HunyuanImage-3.0#%F0%9F%8F%A0-local-installation--usage
$ mkdir hi3 && cd hi3
$ uv venv ./venv --python 3.12 --python-preference=only-managed
$ source venv/bin/activate
$ git clone https://github.com/Tencent-Hunyuan/HunyuanImage-3.0.git 
$ cd HunyuanImage-3.0/
$ uv pip install -r requirements.txt
$ $ uv pip install loguru torchvision 
# i'll replace triton with triton-cpu for my use case, but if u have GPU just try it
$ uv pip uninstall triton
# install triton-cpu from source following: https://github.com/triton-lang/triton-cpu/issues/237#issuecomment-2878180022
$ cd ..
$ git clone https://github.com/triton-lang/triton-cpu --recursive
$ cd triton-cpu
$ uv pip install ninja cmake wheel setuptools pybind11
$ MAX_JOBS=32 uv pip install -e python --no-build-isolation
$ cd ../HunyuanImage-3.0/
$ uv pip install tencentcloud-sdk-python # sketchy lol
$ export SOCKET=1
$ numactl -N "$SOCKET" -m "$SOCKET" \
python3 run_image_gen.py \
  --model-id /mnt/data/models/tencent/HunyuanImage-3/ \
  --verbose 1 \
  --rewrite False \
  --prompt "A cybernetic beaver is chewing on an ai robotic tree."

I also had to comment out any code about rewriting via API with deepseek the prompt as it doesn't seem to listen to --rewrite 0 etc...

UPDATE

I ran a smaller 5 step gen just to test. It doesn't seem to honor passing in size e.g. --image-size 512x512 and always does 1024x1024... Main issue though is it fails on decode: RuntimeError: mixed dtype (CPU): expect parameter to have scalar type of Floatso gotta fuss some more.

wsbagnsv1

about 5 hours ago

Unfortunately, not so simple as changing a few lines in llama.cpp's convert_hf_to_gguf.py as this model has different named tensors:
$ numactl -N 1 -m 1 \
python \
    convert_hf_to_gguf.py \
    --outtype bf16 \
    --split-max-size 50G \
    --outfile /mnt/data/models/ubergarm/HunyuanImage-3.0-GGUF \
    /mnt/data/models/tencent/HunyuanImage-3/
INFO:hf-to-gguf:Loading model: HunyuanImage-3
WARNING:hf-to-gguf:Trying to load config.json instead
INFO:gguf.gguf_writer:gguf: This GGUF file is for Little Endian only
INFO:hf-to-gguf:Exporting model...
INFO:hf-to-gguf:gguf: loading model weight map from 'model.safetensors.index.json'
INFO:hf-to-gguf:gguf: loading model part 'model-0001-of-0032.safetensors'
Traceback (most recent call last):
  File "/home/w/projects/llama.cpp/convert_hf_to_gguf.py", line 8466, in modify_tensors
    return [(self.map_tensor_name(name), data_torch)]
             ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/w/projects/llama.cpp/convert_hf_to_gguf.py", line 259, in map_tensor_name
    raise ValueError(f"Can not map tensor {name!r}")
ValueError: Can not map tensor 'final_layer.model.0.emb_layers.1.bias'
Plan B for me is to try get the demo running on a CPU only big AMD EPYC Rig (thanks Wendell of level1techs for the hardware!!) with triton-cpu and search replacing "cuda" to "cpu" lol... looks like 2 hours 45 minutes to generate a single 1024x image so far...

4%|█████ | 2/50 [06:20<2:31:38, 189.55s/it]

This is probably much easier if you have ~180GB VRAM or so 😅

Here is my procedure:
# modified from https://huggingface.co/tencent/HunyuanImage-3.0#%F0%9F%8F%A0-local-installation--usage
$ mkdir hi3 && cd hi3
$ uv venv ./venv --python 3.12 --python-preference=only-managed
$ source venv/bin/activate
$ git clone https://github.com/Tencent-Hunyuan/HunyuanImage-3.0.git 
$ cd HunyuanImage-3.0/
$ uv pip install -r requirements.txt
$ $ uv pip install loguru torchvision 
# i'll replace triton with triton-cpu for my use case, but if u have GPU just try it
$ uv pip uninstall triton
# install triton-cpu from source following: https://github.com/triton-lang/triton-cpu/issues/237#issuecomment-2878180022
$ cd ..
$ git clone https://github.com/triton-lang/triton-cpu --recursive
$ cd triton-cpu
$ uv pip install ninja cmake wheel setuptools pybind11
$ MAX_JOBS=32 uv pip install -e python --no-build-isolation
$ cd ../HunyuanImage-3.0/
$ uv pip install tencentcloud-sdk-python # sketchy lol
$ export SOCKET=1
$ numactl -N "$SOCKET" -m "$SOCKET" \
python3 run_image_gen.py \
  --model-id /mnt/data/models/tencent/HunyuanImage-3/ \
  --verbose 1 \
  --rewrite False \
  --prompt "A cybernetic beaver is chewing on an ai robotic tree."
I also had to comment out any code about rewriting via API with deepseek the prompt as it doesn't seem to listen to --rewrite 0 etc...

UPDATE

I ran a smaller 5 step gen just to test. It doesn't seem to honor passing in size e.g. --image-size 512x512 and always does 1024x1024... Main issue though is it fails on decode: RuntimeError: mixed dtype (CPU): expect parameter to have scalar type of Floatso gotta fuss some more.

You can do it another way, but that depends on what the support will look like, if it gets comfyui support its straight forward to quant, if it only gets llama.cpp support that is different.