sourav6565
/

ppo-LunarLander-v2

Model card Files Files and versions

YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

Deep RL Course - LunarLander-v2 (Unit 1 + Unit 8)

This repo bundles both LunarLander-v2 agents from the Hugging Face Deep RL Course.

unit1/ - PPO via Stable-Baselines3

File: unit1/ppo-LunarLander-v2.zip
Algorithm: PPO (Stable-Baselines3), MlpPolicy
mean_reward: ~229.64 (solved)

unit8/ - PPO from scratch (CleanRL-style)

Weights: unit8/ppo_scratch_lunarlander.pt
Code: unit8/ppo_scratch.py
Algorithm: PPO implemented from scratch in PyTorch (GAE, clipped surrogate, value loss, entropy bonus, LR annealing, grad clipping)
Training: 4,000,000 timesteps, 8 parallel envs
mean_reward: ~156 (deterministic, 50 episodes)

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support