HRM-Text-1B GGUF

This repository contains a BF16 GGUF conversion of sapientinc/HRM-Text-1B and validated Q8_0, Q6_K, and Q5_K_M quantizations derived from that BF16 GGUF.

The GGUF files use:

  • general.architecture = hrm_text
  • BF16 source tensor storage or standard llama.cpp quantized tensor storage
  • the original tokenizer from tokenizer.json
  • no injected chat template

This is not a chat model and is not instruction tuned. "Useful output" for this repository means alignment with the original Transformers model on the same prompt, not chat-assistant behavior.

Compatibility Notice

Standard upstream llama.cpp, Ollama, LM Studio, and llama-cpp-python are expected not to load this file until hrm_text is supported upstream.

Use the included patch:

runtime/llama.cpp-hrm_text.patch

The patch was built against:

ggml-org/llama.cpp commit 6a257d44633d4a752183ed778b88d2924d0a6b9d

Only the normal causal generation path is implemented in the patched runtime. Prefix-LM bidirectional token_type_ids are not supported by the llama.cpp path in this release.

Files

File Description
HRM-Text-1B-BF16.gguf BF16 GGUF conversion of sapientinc/HRM-Text-1B
HRM-Text-1B-Q8_0.gguf Validated Q8_0 quantization from BF16
HRM-Text-1B-Q6_K.gguf Validated Q6_K quantization from BF16
HRM-Text-1B-Q5_K_M.gguf Validated Q5_K_M quantization from BF16
runtime/llama.cpp-hrm_text.patch Patch adding hrm_text conversion and runtime support to the clean llama.cpp base commit
reports/validation/final_report.md Human-readable conversion and validation report
reports/validation/quantization_report.md Quantization report, hashes, and pass/fail summary
reports/validation/baseline_transformers.json Transformers baseline prompts, logits, and continuations
reports/validation/bf16_tensor_validation.json Tensor-level GGUF validation
reports/validation/bf16_vs_hf.json Runtime logit and text validation
reports/validation/q8_0_vs_bf16.json Q8_0 vs BF16 runtime validation
reports/validation/q6_k_vs_bf16.json Q6_K vs BF16 runtime validation
reports/validation/q5_k_m_vs_bf16.json Q5_K_M vs BF16 runtime validation

Provenance

Item Value
Source model sapientinc/HRM-Text-1B
Source snapshot SHA 2285b999f6fb8a5b16e0cc313a9e8e4fe447140d
Source model.safetensors SHA256 F8FE2B2BF6948414E8E8D6538659198726D98F967C55B533B7AABE8A1FA9A584
BF16 GGUF SHA256 2DD5E2EF55E40C46DB0D0CB4CF1427A4E72DA34FEE36F0D2B73D081D0E1C2010
BF16 GGUF size 2,367,995,648 bytes
llama.cpp base commit 6a257d44633d4a752183ed778b88d2924d0a6b9d

Available GGUF Files

Variant File Size (bytes) SHA256
BF16 HRM-Text-1B-BF16.gguf 2367995648 2DD5E2EF55E40C46DB0D0CB4CF1427A4E72DA34FEE36F0D2B73D081D0E1C2010
Q8_0 HRM-Text-1B-Q8_0.gguf 1259126560 C0729C267C3421E1F6DE0488AC5448E98EA30E56514DAF210596B70AC3F9786D
Q6_K HRM-Text-1B-Q6_K.gguf 972668704 24D93CA4EF4A02CFE415E3EA56A78AD65198A165A4157B928004B58DBDA2D93C
Q5_K_M HRM-Text-1B-Q5_K_M.gguf 851509024 F6CE71A076EC897174C555D810ED6E379767D52F9396D485B42E42BF8DB1D0B7

Validation Summary

Validation was performed from a clean source snapshot and a clean llama.cpp base checkout.

Check Result
Tensor validation Pass, 259/259 tensors found and compared
Tensor values BF16 tensor bits match HF after expected BF16 conversion
Prompt token IDs Match for all validation prompts
Next-token top-1 Match on 4/4 prompts
Top-10 overlap 10/10 for all prompts
Text validation BF16 GGUF continuations are aligned with Transformers baseline

Quantized variants were validated against the BF16 GGUF:

Variant Token IDs Top-1 matches Min top-10 overlap New loop check Result
Q8_0 Pass 4/4 9/10 Pass Pass
Q6_K Pass 4/4 9/10 Pass Pass
Q5_K_M Pass 4/4 9/10 Pass Pass

Full-vocab mean absolute logit error:

Prompt MAE
The quick brown fox 0.0199148655
In a distant future, humanity 0.0051696529
Question: What is 2+2?\nAnswer: 0.0076530445
def fibonacci(n): 0.0045031775

The original model already repeats on some prompts. Repetition by itself is not treated as a conversion failure unless it is newly introduced by the GGUF runtime. The BF16 GGUF validation did not reproduce the unrelated garbage pattern seen in a previous broken conversion attempt.

Example Runtime Setup

Download this repository:

pip install -U huggingface_hub
hf download sinimiini/HRM-Text-1B-GGUF --local-dir HRM-Text-1B-GGUF

Patch and build llama.cpp:

git clone https://github.com/ggml-org/llama.cpp
cd llama.cpp
git checkout 6a257d44633d4a752183ed778b88d2924d0a6b9d
git apply ..\HRM-Text-1B-GGUF\runtime\llama.cpp-hrm_text.patch
cmake -B build -S . -DGGML_NATIVE=OFF
cmake --build build --config Release --target llama-cli llama-completion llama-results

Run a short causal-generation smoke test:

.\build\bin\Release\llama-cli.exe -m ..\HRM-Text-1B-GGUF\HRM-Text-1B-BF16.gguf -p "The quick brown fox" -n 32 --temp 0 --no-conversation

Depending on the generator binary and llama.cpp build type, the executable may be under build\bin\llama-cli.exe instead of build\bin\Release\llama-cli.exe.

Limitations

  • hrm_text is a custom GGUF architecture in this conversion.
  • Generic GGUF runners will not work until they implement the HRM runtime graph.
  • Prefix-LM bidirectional attention with token_type_ids is not implemented in the patched llama.cpp path.

License

The source model is released under the Apache 2.0 license. See LICENSE.

Downloads last month
468
GGUF
Model size
1B params
Architecture
hrm_text
Hardware compatibility
Log In to add your hardware

5-bit

6-bit

8-bit

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for sinimiini/HRM-Text-1B-GGUF

Quantized
(4)
this model