CNTK v2 model load out-of-bounds read (SIGSEGV) PoC
This repository is a security proof-of-concept for an out-of-bounds read in
Microsoft Cognitive Toolkit (CNTK) v2 when loading a crafted model file with
cntk.Function.load() / cntk.load_model().
It is gated on purpose. The crafted files crash a native deserializer; do not load them outside a throwaway environment.
What is here
evil_gatherpacked.cntkmodelcrafted model, opGatherPacked, zero inputsevil_packedindex.cntkmodelcrafted model, opPackedIndex, zero inputsevil_scatterpacked.cntkmodelcrafted model, opScatterPacked, zero inputsgood.cntkmodelbenign control model (loads fine)verify.pyoffline differential verifier (child process per load, prints exit codes)generate.pyregenerates all crafted files from scratchCNTK.proto/CNTK_pb2.pythe on-disk format schema used to craft the files
Root cause
The CNTK v2 on-disk format is a protobuf-serialized Dictionary tree. Loading runs
CNTK::Function::Load -> CompositeFunction::Deserialize, which rebuilds each
PrimitiveFunction and then calls RawOutputs() -> InitOutputs() ->
PrimitiveFunction::InferOutputs() -> PrimitiveFunction::GetOutputDynamicAxes()
while still inside the load call.
GetOutputDynamicAxes (Source/CNTKv2LibraryDll/PrimitiveFunction.cpp) indexes the
operand vector by fixed position based on the op, with no bounds check:
else if (op == PrimitiveOpType::ScatterPacked)
outputDynamicAxes = inputs[2].DynamicAxes();
else if ((op == PrimitiveOpType::PackedIndex) || (op == PrimitiveOpType::GatherPacked))
outputDynamicAxes = inputs[1].DynamicAxes();
inputs is the deserialized operand list. Its length is taken verbatim from the
model file (GetInputVariables) and is never checked against the arity the op
requires. A crafted function with one of these ops and an empty inputs vector makes
inputs[1] / inputs[2] read past the end of a std::vector<Variable>. The
out-of-bounds Variable holds a garbage m_dataFields pointer, and
Variable::DynamicAxes() dereferences it, faulting.
Observed result (Linux, CNTK 2.7 CPU, Python 3.6)
good.cntkmodel rc=0 ok benign control LOADED_OK
evil_packedindex.cntkmodel rc=-11 SIGSEGV op=PackedIndex(28), 0 inputs -> inputs[1] OOB
evil_gatherpacked.cntkmodel rc=-11 SIGSEGV op=GatherPacked(29), 0 inputs -> inputs[1] OOB
evil_scatterpacked.cntkmodel rc=-11 SIGSEGV op=ScatterPacked(30),0 inputs -> inputs[2] OOB
Backtrace at the fault:
#0 CNTK::Variable::DynamicAxes() const
#1 CNTK::PrimitiveFunction::GetOutputDynamicAxes(...)
#2 CNTK::PrimitiveFunction::InferOutputs(...)
#6 CNTK::Function::InitOutputs()
#7 CNTK::CompositeFunction::Deserialize(...)
#8 CNTK::Function::Deserialize(...)
#9 CNTK::Function::Load(...)
#10 _wrap_Function_load (cntk.Function.load)
Only the crafted mutation crashes; the otherwise-identical benign model loads fine. That differential rules out a generic large-allocation failure.
Reproduce
CNTK 2.7 ships CPU wheels for Python 3.6 only (manylinux1). In a Python 3.6 env with
cntk==2.7 installed and the OpenMPI 1.10 runtime (libmpi.so.12) on the library
path:
python verify.py
To rebuild the crafted files from a fresh benign model:
python -m grpc_tools.protoc -I. --python_out=. CNTK.proto
python generate.py