You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

This is a quantized redistribution of a gated research artifact with reduced refusal behavior. By requesting access you affirm that you are of legal age in your jurisdiction, that you will not use it to generate CSAM, CBRN, or mass-harm content, and that you accept the original model license and disclaimer.

Log in or Sign Up to review the conditions and access this model content.

GLM-5.2-Uncensored - GGUF

As a reaction to the CEO of a fronteir model lobbying the media + congress to boogeyman open source models, I release:

Imatrix GGUF quantizations of zandenAI/GLM-5.2-FP8-Uncensored, an abliterated (refusal-removed) build of zai-org/GLM-5.2-FP8 - a 754B-parameter Mixture-of-Experts model.

All credit for the abliteration methodology and the source weights goes to Zanden Kane (@zandenkane). This repository only provides GGUF conversions for local inference.

Method

  • Source: FP8 -> dequantized to BF16 -> GGUF (the BF16 source is included here).
  • Quants: built with an importance matrix (imatrix) and a dynamic, MoE-aware recipe - experts at the target bit-width, with token-embeddings / output / attention kept higher (Q8_0 / Q6_K).
  • Calibration: standard public imatrix calibration corpus.

Files

File Type ~Size Notes
glm52-BF16-*.gguf BF16 ~1.5 TB full-precision source; re-quant from this
glm52-IQ1_M.gguf IQ1_M ~180 GB smallest; big-RAM / Mac
glm52-IQ2_M.gguf IQ2_M ~250 GB
glm52-IQ3_XXS.gguf IQ3_XXS ~290 GB
glm52-IQ4_XS.gguf IQ4_XS ~400 GB best quality-per-byte at 4-bit
glm52-Q4_K_M.gguf Q4_K_M ~450 GB robust 4-bit default
glm52-Q6_K.gguf Q6_K ~617 GB near-lossless
glm52.imatrix imatrix small roll your own quant levels

(Quants upload as they finish - some may still be in progress.)

Run with llama.cpp

./llama-server -m glm52-IQ4_XS.gguf -ngl 99 -c 16384 --host 0.0.0.0 --port 8080

For the sharded BF16, download all parts and point llama.cpp at the first shard; it auto-loads the rest.

Hardware guide

Quant Fits on
IQ1_M / IQ2_M 256 GB RAM box, 192-256 GB Mac, or 3-4x A100
IQ3_XXS / IQ4_XS 384-512 GB RAM, 512 GB Mac, or 6-8x A100
Q4_K_M / Q6_K 8x A100 / H100, or large-RAM CPU/offload

Safety

The source model removes general refusals but preserves categorical refusal for CSAM / minor-exploitation content. Use responsibly and in accordance with your local laws. The original license and disclaimer apply in full.

License

MIT (inherited from the base model).

Credits

  • Base model: Z.ai / zai-org - GLM-5.2-FP8
  • Abliteration + source weights: Zanden Kane - zandenAI/GLM-5.2-FP8-Uncensored
  • GGUF conversion (imatrix + dynamic recipe): this repository
Downloads last month
-
GGUF
Model size
753B params
Architecture
glm-dsa
Hardware compatibility
Log In to add your hardware

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for phaseonx11/GLM-5.2-Uncensored-GGUF

Quantized
(1)
this model