how to run it in server mode?

#1
by usermma - opened

how to run it locally in server mode? i tried to run it, i fail, this is the code

https://huggingface.co/spaces/usermma/supergemma4-e4b-abliterated-multimodal-gguf-4bit/blob/main/Dockerfile

The code you linked seems to be a way to run llama.cpp from Python, right? This is a UNet model, not a transformer-based LLM. Did you link to the wrong URL?

okay sorry, so there is no way from running it from gguf like llama.cpp?

Unfortunately no, and it's totally outside of the scope of llama.cpp. llama.cpp's purpose is to run inference on LLM models. This repo contains different quantizations of an image generation model based on SDXL. They're meant to be run with tools like ComfyUI.

InsecureErasure changed discussion status to closed

okay good, i will take my time into spending it towards into how to use it into ComfyUI, if it didn't work, i will still trying...

Sign up or log in to comment