SuperMarioBros-Nes-v0 Level1-2 PPO

PPO policy checkpoint for completing SuperMarioBros-Nes-v0 Level1-2 with Stable Retro, trained with rlab.

At a Glance

Item	Value
Task	Complete `SuperMarioBros-Nes-v0` `Level1-2`
Model	Stable-Baselines3 PPO
Format	SB3 `.zip` checkpoint
Checkpoint	`model.zip`
W&B artifact	`tsilva/SuperMarioBros-NES/b272-l12-b55-transfer-s6-20260703T171021Z-checkpoint:step-4500000`
Checkpoint step	`4500000`
Eval completion rate	`100/100` episodes (100.0%)
Eval mean reward	`3148.450`
Eval max x-position	`3129`
Training peak signal	`train/info/level_complete/rate/min/last = 0.98` near global step `4741488-4745680`
W&B run	`b272-l12-b55-transfer-s6-20260703T171021Z`
YouTube preview	https://www.youtube.com/watch?v=emJ0NHXhUIg

Quick Start

Install rlab once, import the ROM, then play or evaluate this checkpoint directly from Hugging Face:

uv tool install --from git+https://github.com/tsilva/rlab rlab
rlab import-roms ~/roms --game SuperMarioBros-Nes-v0
rlab play hf://tsilva/SuperMarioBros-NES_Level1-2
rlab eval hf://tsilva/SuperMarioBros-NES_Level1-2

For the original W&B artifact:

rlab play tsilva/SuperMarioBros-NES/b272-l12-b55-transfer-s6-20260703T171021Z-checkpoint:step-4500000 --policy-env fast

Validate

This release was selected from the seed-6 training peak, then freshly evaluated during publication staging:

rlab eval tsilva/SuperMarioBros-NES/b272-l12-b55-transfer-s6-20260703T171021Z-checkpoint:step-4500000 --episodes 100 --deterministic

The preview video in replay.mp4 was generated from the best episode observed during the same deterministic evaluation pass.

Results

Metric	Value
Completion rate	100/100 (100.0%)
Mean reward	3148.450
Max x-position	3129
Best episode reward	3148.450
Best episode max x-position	3129
Checkpoint step	4500000

Files

File	Description
`model.zip`	SB3 PPO checkpoint
`replay.mp4`	Representative preview episode for the Hugging Face RL widget
`model_metadata.json`	Downloaded W&B artifact metadata plus publish-time training-peak note
`release_manifest.json`	Release provenance, eval metrics, and video verification inputs

Environment Details

Setting	Value
`env_provider`	`supermariobrosnes-turbo`
`game`	`SuperMarioBros-Nes-v0`
`state`	`Level1-2`
`action_set`	`simple`
`frame_skip`	`4`
`max_pool_frames`	`False`
`hud_crop_top`	`32`
`observation_size`	`84`
`eval_done_on_events`	`level_change`

Provenance

Source project: rlab
W&B run: b272-l12-b55-transfer-s6-20260703T171021Z
W&B artifact: tsilva/SuperMarioBros-NES/b272-l12-b55-transfer-s6-20260703T171021Z-checkpoint:step-4500000
Selection basis: nearest uploaded checkpoint to the seed-6 training peak
Eval source: fresh local publication staging eval, deterministic policy, 100 episodes, seed 10000
YouTube preview: https://www.youtube.com/watch?v=emJ0NHXhUIg

Limitations

This checkpoint was selected from a training metric peak and then evaluated for publication. Reported metrics are task-specific and should not be treated as cross-environment benchmark results.

Downloads last month: -

Video Preview

Reinforcement Learning