How to use from the
Use from the
llama-cpp-python library
# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="librepowerai/Qwen3-8B-Power",
	filename="",
)
llm.create_chat_completion(
	messages = "No input example has been defined for this model task."
)

Qwen3-8B โ€” Q4_K_M for IBM Power (Linux ppc64le + AIX)

Qwen3-8B quantized to Q4_K_M with a Q6_K output head for fast CPU inference on IBM Power โ€” POWER9 (VSX) and POWER10/11 (MMA-accelerated) via LibrePower. No GPU required. Size: 4.7G.

Run it

Ubuntu / Debian ppc64le:

curl -fsSL https://linux.librepower.org/install.sh | sudo sh
sudo apt install librepower-llama
wget https://huggingface.co/librepowerai/Qwen3-8B-Power/resolve/main/Qwen3-8B-Q4_K_M.gguf
lp-llama-completion -m Qwen3-8B-Q4_K_M.gguf -p "Hello!" -n 64 -t $(nproc)

IBM AIX 7.3 (big-endian):

dnf install llama-aix
wget https://huggingface.co/librepowerai/Qwen3-8B-Power/resolve/main/Qwen3-8B-Q4_K_M-be.gguf
lp-llama-completion -m Qwen3-8B-Q4_K_M-be.gguf -p "Hello!" -n 64 -t $(nproc)

Files

  • Qwen3-8B-Q4_K_M.gguf โ€” little-endian (Ubuntu/Linux ppc64le)
  • Qwen3-8B-Q4_K_M-be.gguf โ€” big-endian (IBM AIX)

Good for

Top-quality 8B: complex reasoning, code, RAG, agents (hybrid thinking โ€” use /no_think for low latency)

Credits

Base model by its original authors (Apache-2.0). Quantization & Power packaging: LibrePower.

Downloads last month
42
GGUF
Model size
8B params
Architecture
qwen3
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for librepowerai/Qwen3-8B-Power

Finetuned
Qwen/Qwen3-8B
Quantized
(317)
this model