Apache Avro Python deflate decompression bomb PoC
This repository contains a minimal proof of concept for a resource-exhaustion issue in Apache Avro's official Python reader.
Affected:
avro==1.12.1from PyPI- Apache Avro
mainat commit840dc8139f4b3d1bfa8e8c8f1ac3be949b440634
Root cause:
avro.datafile.DataFileReader.__next__()calls_read_block_header()._read_block_header()callscodec.decompress(self.raw_decoder).avro.codecs.DeflateCodec.decompress()reads the compressed Avro block and callszlib.decompress(data, -15)without any maximum decompressed-size or expansion-ratio limit.
The included avro-deflate-128m.avro is a valid deflate-coded Avro object container file. It is 130,634 bytes on disk and expands to a 134,217,728-byte bytes field during normal DataFileReader iteration.
Reproduction
python3 -m venv .venv
.venv/bin/pip install avro==1.12.1
.venv/bin/python verify_avro_deflate_bomb_poc.py \
avro-deflate-control.avro \
avro-deflate-128m.avro
Expected result on the test host:
control file_size=181 -> loaded, payload_len=1024, maxrss_after_kb around 18,000
bomb file_size=130634 -> loaded, payload_len=134217728, maxrss_after_kb around 280,000
With a 160 MiB address-space cap, the control loads while the bomb fails in the Avro block decompression path:
.venv/bin/python verify_avro_deflate_bomb_poc.py --limit-mb 160 \
avro-deflate-control.avro \
avro-deflate-128m.avro
Observed exception:
MemoryError: Unable to allocate output buffer.
File ".../avro/datafile.py", line 404, in __next__
File ".../avro/datafile.py", line 386, in _read_block_header
File ".../avro/codecs.py", line 126, in decompress
uncompressed = zlib.decompress(data, -15)
Files
avro-deflate-128m.avro- 130,634-byte trigger file, SHA256a050bf7715a45d46f0abe327b94557e7d5f209cbdb549292de9e3fe5104df8f0avro-deflate-control.avro- 181-byte control file, SHA256fd1d3d5cc0722727329d536ee8ab20e4fc3da2629c8921ffde850e96056d7ae9make_avro_deflate_bomb_poc.py- generatorverify_avro_deflate_bomb_poc.py- verifier
Notes
Apache Avro Java recently added decompression-size limits for the same class of codec bomb in AVRO-4247. This PoC demonstrates that the official Python Avro reader still lacks an equivalent limit in the latest PyPI release and in current Apache Avro main.