You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

Signed int64 overflow in llama.cpp legacy GGML converter (`convert_llama_ggml_to_gguf.py`) — negative tensor length causes infinite loop + unbounded memory growth (DoS) on load of an untrusted `.ggml` file

Target / scope

Affected tool: convert_llama_ggml_to_gguf.py — the legacy GGML/GGML F/GGJT → GGUF conversion script shipped in the llama.cpp repository (ggml-org/llama.cpp, master). It imports and depends on the gguf Python package (GGMLQuantizationType, GGML_QUANT_SIZES).
Format: GGML legacy (.ggml, magics lmgg/fmgg/tjgg = GGML/GGMF/GGJT). Reproduced against GGJT v3.
Vulnerability class: signed-integer (int64) overflow → negative length → non-terminating loop + unbounded allocation = Denial-of-Service on model load.
Huntr target dropdown: there is no exact match. The closest in-scope library is gguf (the bug is reachable via the gguf-dependent converter that ships in the llama.cpp repo). See the explicit Scope note below — this should be confirmed with the program before submission.

Severity (honest)

Low / DoS tier. This is a pure denial-of-service: an infinite loop with unbounded list.append, i.e. CPU-spin plus monotonic memory growth until the process is OOM-killed. There is no code execution and no file read/write/disclosure — the benign /etc/passwd/temp-marker lines in reproduce.py only illustrate the threat model (untrusted input on a real box), they are not an exploit primitive.

Per the format rubric, RCE > file-access > DoS, and .gguf/.keras/.joblib/.safetensors/TF-SavedModel sit at the ~$4k tier. This finding is neither RCE nor the modern .gguf loader — it is a legacy GGML converter DoS — so the realistic value is at the bottom of the range (DoS, ~low hundreds at most), not $4k. Reported honestly as a robustness/DoS bug, not a memory-corruption-to-RCE.

Note on the original triage label "out-of-bounds read / memory corruption": I could not substantiate an OOB read or memory corruption. The negative len_bytes does not index past the buffer; with numpy memmap/frombuffer it rewinds/freezes the offset and the parser keeps re-reading in-bounds bytes. The demonstrable, reproducible impact is infinite loop + unbounded memory = DoS. I am downgrading the claim accordingly rather than overstate it.

Summary

convert_llama_ggml_to_gguf.py parses an untrusted legacy GGML file. For each tensor it reads a dimension count (asserted 0 <= n_dims <= 4) and then that many 32-bit dimension values, which are not bounds-checked. It computes the tensor byte length as np.prod(dims) and immediately advances the file offset by that length. Because np.prod over the dimension tuple is evaluated in signed int64, a product exceeding 2**63 silently overflows to a negative number. A negative byte-length makes the per-tensor offset delta <= 0, so the outer while offset < len(data) loop never advances while it appends a new tensor object every iteration — a non-terminating loop with unbounded memory growth. The whole thing is triggered by merely loading a crafted ~288-byte file (before any tensor data is read or validated).

Root cause (file:line)

convert_llama_ggml_to_gguf.py, Tensor.load (lines 110–131):

def load(self, data, offset):
    orig_offset = offset
    (n_dims, name_len, dtype) = struct.unpack('<3I', data[offset:offset + 12])
    assert n_dims >= 0 and n_dims <= 4, f'Invalid tensor dimensions {n_dims}'   # L113: only COUNT bounded
    assert name_len < 4096, 'Absurd tensor name length'
    ...
    self.dims = struct.unpack(f'<{n_dims}I', data[offset:offset + (4 * n_dims)])  # L120: dim VALUES unbounded uint32
    offset += 4 * n_dims
    self.name = bytes(data[offset:offset + name_len]); offset += name_len
    pad = ((offset + 31) & ~31) - offset if self.use_padding else 0; offset += pad
    n_elems = np.prod(self.dims)                                                  # L126: int64 -> SIGNED OVERFLOW
    n_bytes = np.int64(np.int64(n_elems) * np.int64(tysize)) // np.int64(blksize) # L127: negative
    self.start_offset = offset
    self.len_bytes = n_bytes                                                      # L129: negative
    offset += n_bytes                                                             # L130: offset does NOT advance (or rewinds)
    return offset - orig_offset                                                   # L131: delta <= 0

convert_llama_ggml_to_gguf.py, GGMLModel.load (line 190):

while offset < len(data):
    tensor = Tensor(use_padding = self.file_format > GGMLFormat.GGMF)
    offset += tensor.load(data, offset)   # L192: delta <= 0 -> offset frozen/rewound
    tensor_map[tensor.name] = len(tensors)
    tensors.append(tensor)                # L194: list grows every iteration -> OOM

The only guard is on the dimension count (n_dims <= 4); the dimension values flow unchecked into np.prod. n_elems and n_bytes are never validated to be non-negative before being used as an offset delta.

Proof of concept

Overflow primitive

dims = (0xFFFFFFFE, 0x80000001, 2, 16)
true product = 295147905179352825792   (~2.95e20, > 2**63)
np.prod(dims)            -> np.int64(-64)        # silent signed-int64 overflow
dtype I8 => (blksize=1, tysize=1) => n_bytes = -64

Crafted file (built by the PoC)

A valid GGJT v3 file: tjgg + version 3, 7×uint32 hyperparameters (ftype = MOSTLY_F16, eligible for the conversion path), a 3-item vocab, then one tensor whose header declares n_dims=4, dtype I8, and the four overflowing dims above. The tensor name length is sized so the bytes consumed by the (32-byte-aligned) tensor header equal exactly 64, which cancels the -64 byte-length so the per-iteration delta is exactly 0 — the parser re-reads the identical malicious header forever. Total file size: 288 bytes.

How the PoC runs the genuine parser

poc_final.py and reproduce.py import the real convert_llama_ggml_to_gguf module and call the genuine Tensor.load for every iteration, exactly as GGMLModel.load does; only the outer while is wrapped with an iteration watchdog so the harness can report the hang instead of freezing. A second check calls the 100% unmodified GGMLModel.load(data, 0) in a thread and observes it never returns.

Captured output (re-verified 2026-06-15, Windows, Python 3.12.10, numpy 2.4.6, `gguf` installed)

[*] np.prod(('0xfffffffe', '0x80000001', '0x2', '0x10')) = np.int64(-64)  (dtype int64); true product = 295147905179352825792 (> 2**63 = True) => silent signed-int64 overflow
[+] wrote malicious legacy GGML: ...\Temp\malicious_legacy.ggml (288 bytes)
[*] file=288B  offset@tensor-loop=64  format=GGJTv3
    iter=1: len_bytes=-64 delta=0 prev=64 -> offset=64 tensors=1
    iter=2: len_bytes=-64 delta=0 prev=64 -> offset=64 tensors=2
    iter=3: len_bytes=-64 delta=0 prev=64 -> offset=64 tensors=3
    iter=100000: len_bytes=-64 delta=0 prev=64 -> offset=64 tensors=100000
    iter=200000: len_bytes=-64 delta=0 prev=64 -> offset=64 tensors=200000
    iter=300000: len_bytes=-64 delta=0 prev=64 -> offset=64 tensors=300000

==== WATCHDOG RESULT ====
{'mode': 'infinite_loop', 'iters': 300000, 'delta': 0, 'len_bytes': -64, 'prev': 64, 'offset': 64, 'tensors': 300000, 'tstart': 64, 'filesize': 288}
[PROVEN] np.prod int64 overflow => len_bytes=-64 => delta=0 => offset frozen at 64
[PROVEN] 300000 iterations, ZERO progress; tensors[] grew to 300000 entries from a 288-byte file (unbounded).

==== UNMODIFIED GGMLModel.load() CHECK ====
[!] UNMODIFIED GGMLModel.load() STILL RUNNING after 6.0s (did NOT return) => confirmed non-terminating loop in the real method.

[RESULT] CONFIRMED: loading a ~288-byte untrusted .ggml hangs the converter (infinite loop) and exhausts memory = Denial-of-Service.

Assertions that pass: np.prod(dims) == -64; mode == "infinite_loop"; len_bytes < 0; delta <= 0; prev == offset (frozen offset / perfect re-read loop); and the unmodified GGMLModel.load() is still alive after the join timeout.

Implementation honesty: in the unmodified-method check, model.tensors reads as 0 from outside the thread because GGMLModel.load appends to a local tensors list and only assigns self.tensors after the loop (line 197), which is never reached. The unbounded growth therefore lives in that local list (real heap memory all the same). The watchdog harness in poc_final.py/reproduce.py is what surfaces the live tensors[] count (300k+ entries) by mirroring the same loop.

Impact / realistic threat model

convert_llama_ggml_to_gguf.py exists specifically to ingest third-party/legacy GGML model files and convert them to GGUF. Any automated pipeline that runs this converter on user- or community-supplied .ggml files — a conversion/CI service, a model-hub ingestion worker, a "bring your own weights" feature, a batch migration job — can be wedged by a ~288-byte file: the worker spins one CPU at 100% and grows memory without bound until the OOM killer terminates it (or the host). No tensor data, no large file, and no valid model are required; the hang occurs in the header-parsing prologue. It is a cheap, reliable, pre-authentication-style DoS against the converting process.

It is not RCE and not information disclosure. The reproduce.py benign demo (temp marker + first line of /etc/passwd on POSIX) is included only to make the "attacker hands you a file on a real machine" model tangible for the reviewer; those operations are inert and unrelated to the bug.

Honest duplicate / prior-art note

I have not located a public advisory/CVE/issue describing this specific signed-int64 np.prod overflow in convert_llama_ggml_to_gguf.py. It should still be treated as dup-risk: integer-overflow-in-a-length-field is a well-trodden bug class, the script is old, and similar hardening may already be discussed upstream. A reviewer should grep the llama.cpp issue tracker for "convert_llama_ggml" / "n_dims" / "overflow" before accepting.
This is distinct from any modern-GGUF-loader finding (e.g. a gguf.GGUFReader issue): this is the legacy GGML/GGJT converter path, a different code file and parser.
The crafted-input technique echoes the classic "negative length → loop/rewind" pattern; the novelty (if any) is its concrete reachability and zero-delta stationary-loop construction in this specific shipped converter.

Scope flag (please confirm before submission)

This needs program-scope confirmation. The vulnerable code lives in convert_llama_ggml_to_gguf.py, a conversion utility checked into the llama.cpp repository, not in the gguf PyPI package's own modules — even though it imports gguf. Whether huntr's bounty scope for this asset covers: (a) the gguf Python package only, or (b) GGML/GGUF-handling code shipped in the llama.cpp repo (including this converter), materially affects eligibility. If the program scopes only the gguf package, this converter may be out of scope and should be redirected to llama.cpp's own security process. Flagging explicitly per honest-scope policy (ggml/tensorizer/orbax-style assets and modelaudit/picklescan-style tools always warrant a scope check).

Remediation

In Tensor.load, validate the decoded geometry before using it as an offset delta. Minimal fixes:

Compute the element count with Python big-ints / overflow-safe math and reject negative or absurd sizes:

n_elems = 1
for d in self.dims:
    if d < 0:
        raise ValueError('Invalid (negative) tensor dimension')
    n_elems *= int(d)                      # python int: no silent overflow
n_bytes = (n_elems * tysize) // blksize
if n_bytes < 0 or self.start_offset + n_bytes > len(data):
    raise ValueError('Tensor length out of bounds')

Bound each dimension value (e.g. assert all(0 <= d < 2**32 and product stays within file size)), and assert the post-advance offset is strictly greater than orig_offset so the outer loop is guaranteed to make forward progress.
As defence-in-depth, make GGMLModel.load's while loop fail if tensor.load(...) returns a non-positive delta, instead of looping forever.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support