Reinforcement Learning
stable-baselines3
ppo
stable-retro
rlab
super-mario-bros
nes
SuperMarioBros-Nes-v0
Instructions to use tsilva/SuperMarioBros-NES_Level1-2 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- stable-baselines3
How to use tsilva/SuperMarioBros-NES_Level1-2 with stable-baselines3:
from huggingface_sb3 import load_from_hub checkpoint = load_from_hub( repo_id="tsilva/SuperMarioBros-NES_Level1-2", filename="{MODEL FILENAME}.zip", ) - Notebooks
- Google Colab
- Kaggle
SuperMarioBros-Nes-v0 Level1-2 PPO
PPO policy checkpoint for completing SuperMarioBros-Nes-v0 Level1-2 with Stable Retro, trained with rlab.
At a Glance
| Item | Value |
|---|---|
| Task | Complete SuperMarioBros-Nes-v0 Level1-2 |
| Model | Stable-Baselines3 PPO |
| Format | SB3 .zip checkpoint |
| Checkpoint | model.zip |
| W&B artifact | tsilva/SuperMarioBros-NES/b272-l12-b55-transfer-s6-20260703T171021Z-checkpoint:step-4500000 |
| Checkpoint step | 4500000 |
| Eval completion rate | 100/100 episodes (100.0%) |
| Eval mean reward | 3148.450 |
| Eval max x-position | 3129 |
| Training peak signal | train/info/level_complete/rate/min/last = 0.98 near global step 4741488-4745680 |
| W&B run | b272-l12-b55-transfer-s6-20260703T171021Z |
| YouTube preview | https://www.youtube.com/watch?v=emJ0NHXhUIg |
Quick Start
Install rlab once, import the ROM, then play or evaluate this checkpoint directly from Hugging Face:
uv tool install --from git+https://github.com/tsilva/rlab rlab
rlab import-roms ~/roms --game SuperMarioBros-Nes-v0
rlab play hf://tsilva/SuperMarioBros-NES_Level1-2
rlab eval hf://tsilva/SuperMarioBros-NES_Level1-2
For the original W&B artifact:
rlab play tsilva/SuperMarioBros-NES/b272-l12-b55-transfer-s6-20260703T171021Z-checkpoint:step-4500000 --policy-env fast
Validate
This release was selected from the seed-6 training peak, then freshly evaluated during publication staging:
rlab eval tsilva/SuperMarioBros-NES/b272-l12-b55-transfer-s6-20260703T171021Z-checkpoint:step-4500000 --episodes 100 --deterministic
The preview video in replay.mp4 was generated from the best episode observed during the same deterministic evaluation pass.
Results
| Metric | Value |
|---|---|
| Completion rate | 100/100 (100.0%) |
| Mean reward | 3148.450 |
| Max x-position | 3129 |
| Best episode reward | 3148.450 |
| Best episode max x-position | 3129 |
| Checkpoint step | 4500000 |
Files
| File | Description |
|---|---|
model.zip |
SB3 PPO checkpoint |
replay.mp4 |
Representative preview episode for the Hugging Face RL widget |
model_metadata.json |
Downloaded W&B artifact metadata plus publish-time training-peak note |
release_manifest.json |
Release provenance, eval metrics, and video verification inputs |
Environment Details
| Setting | Value |
|---|---|
env_provider |
supermariobrosnes-turbo |
game |
SuperMarioBros-Nes-v0 |
state |
Level1-2 |
action_set |
simple |
frame_skip |
4 |
max_pool_frames |
False |
hud_crop_top |
32 |
observation_size |
84 |
eval_done_on_events |
level_change |
Provenance
- Source project:
rlab - W&B run:
b272-l12-b55-transfer-s6-20260703T171021Z - W&B artifact:
tsilva/SuperMarioBros-NES/b272-l12-b55-transfer-s6-20260703T171021Z-checkpoint:step-4500000 - Selection basis: nearest uploaded checkpoint to the seed-6 training peak
- Eval source: fresh local publication staging eval, deterministic policy,
100episodes, seed10000 - YouTube preview: https://www.youtube.com/watch?v=emJ0NHXhUIg
Limitations
This checkpoint was selected from a training metric peak and then evaluated for publication. Reported metrics are task-specific and should not be treated as cross-environment benchmark results.
- Downloads last month
- -