Doom on ONNX

Doom DEMO1 rendered by the ONNX model

A single self-contained ONNX model (doom.onnx, ~8.4 MB) that, when run on any ONNX Runtime CPU EP, boots and renders the original 1993 Doom. No custom operators, no execution-provider plugins, no Python in the loop β€” just standard ONNX ops (Add, BitwiseAnd, Where, Gather, ScatterElements, Loop, If, …) executing inside a single InferenceSession.run call.

The model contains:

  • an RV32IM CPU built entirely out of ONNX operators,
  • the doom1.wad shareware game data as a read-only initializer,
  • the doomgeneric Doom source cross-compiled to bare-metal RV32IM and baked into RAM as another initializer.

Reference render

The doom.gif in this repo was assembled from 74 PNG frames captured during a single InferenceSession.run invocation:

  • Total: 80,000,000 RV32IM instructions, 10.8 hours wall time
  • Rate: 1,562 IPS (init code) β†’ 2,053 IPS (in-game rendering)
  • Reached: title wipe β†’ menu β†’ DEMO1 load β†’ game logic β†’ 3D BSP rendering of actual gameplay (frames 54–75)

Performance

~2,000 simulated RV32IM instructions per second on a modern laptop CPU. This is not a real-time emulator. One frame every ~9 minutes is the reality on a single CPU thread. See PERF_INVESTIGATION.md in the source repo for the full investigation (TL;DR: ORT's MayInplace alias doesn't apply to Loop-carried state, so the 8 MiB RAM gets fully copied per ScatterElements).

Running

import numpy as np
import onnxruntime as rt

sess = rt.InferenceSession("doom.onnx", providers=["CPUExecutionProvider"])

RAM_SIZE = 8 * 1024 * 1024
ram = np.load("initial_ram.npy")  # Doom ELF baked into RAM
pc  = np.array(0x1000, dtype=np.int32)
regs = np.zeros(32, dtype=np.int32)

MMIO_TICK = RAM_SIZE - 16
sim_ms = 0
for chunk in range(250):
    ram[MMIO_TICK:MMIO_TICK + 4] = np.frombuffer(
        np.uint32(sim_ms).tobytes(), dtype=np.uint8)
    sim_ms += 100
    pc, regs, ram = sess.run(None, {
        "pc_in": pc, "regs_in": regs, "ram_in": ram,
        "trip_count": np.array(100_000, dtype=np.int64),
    })
    # framebuffer at RAM_SIZE - 32 - 64000, 320Γ—200 palette indices

The host only writes a millisecond counter into the MMIO tick register between chunks and reads the framebuffer out of the returned ram_out.

Inputs / outputs

Name Type Shape Role
pc_in int32 scalar program counter
regs_in int32 [32] x0..x31 (x0 forced to 0)
ram_in uint8 [8 MiB] full writable memory
trip_count int64 scalar how many insts to execute
pc_out / regs_out / ram_out (same types) post-state

rom (the WAD) is a read-only initializer baked into the model.

License

The CPU + DMA glue is MIT. doomgeneric and the original Doom source are GPL-2.0; the shareware doom1.wad ships under id Software's shareware terms. This model card and model file inherit GPL-2.0.

Acknowledgements

id Software for releasing Doom's source. The doomgeneric project for the platform-agnostic port. The ONNX team for an op set this absurdly expressive.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support