CatkinChen commited on
Commit
1fabc53
·
verified ·
1 Parent(s): e7fcff3

Add model card for ablation_baseline_MiniHack_Room_5x5_v0_20250919-143836

Browse files
Files changed (1) hide show
  1. README.md +5 -5
README.md CHANGED
@@ -19,7 +19,7 @@ This repository contains a complete Sequential Skill RL model trained on NetHack
19
 
20
  ### 1. PPO Policy (`ppo_policy.pth`)
21
  - **Type**: Proximal Policy Optimization agent
22
- - **Environment**: MiniHack-Room-Random-15x15-v0
23
  - **Training Steps**: 50,000
24
  - **Features**:
25
  - Curiosity-driven exploration: True
@@ -52,7 +52,7 @@ hmm_data = torch.load('hmm_model.pth', map_location=device)
52
 
53
  # Use for inference or continued training
54
  results = train_online_ppo_with_pretrained_models(
55
- env_name="MiniHack-Room-Random-15x15-v0",
56
  vae_repo_id="CatkinChen/nethack-vae-hmm",
57
  hmm_repo_id="CatkinChen/nethack-hmm",
58
  test_mode=True
@@ -61,10 +61,10 @@ results = train_online_ppo_with_pretrained_models(
61
 
62
  ## Training Configuration
63
 
64
- - **Environment**: MiniHack-Room-Random-15x15-v0
65
  - **Learning Rate**: 0.0005
66
  - **Batch Size**: 32
67
- - **Training Time**: 6837.78 seconds
68
  - **Device**: cuda
69
  - **Seed**: None
70
 
@@ -74,4 +74,4 @@ Training completed successfully with the following configuration:
74
  - Curiosity-driven exploration: True
75
  - Random Network Distillation: False
76
 
77
- Generated on: 2025-09-19 14:19:25
 
19
 
20
  ### 1. PPO Policy (`ppo_policy.pth`)
21
  - **Type**: Proximal Policy Optimization agent
22
+ - **Environment**: MiniHack-Room-5x5-v0
23
  - **Training Steps**: 50,000
24
  - **Features**:
25
  - Curiosity-driven exploration: True
 
52
 
53
  # Use for inference or continued training
54
  results = train_online_ppo_with_pretrained_models(
55
+ env_name="MiniHack-Room-5x5-v0",
56
  vae_repo_id="CatkinChen/nethack-vae-hmm",
57
  hmm_repo_id="CatkinChen/nethack-hmm",
58
  test_mode=True
 
61
 
62
  ## Training Configuration
63
 
64
+ - **Environment**: MiniHack-Room-5x5-v0
65
  - **Learning Rate**: 0.0005
66
  - **Batch Size**: 32
67
+ - **Training Time**: 0.02 seconds
68
  - **Device**: cuda
69
  - **Seed**: None
70
 
 
74
  - Curiosity-driven exploration: True
75
  - Random Network Distillation: False
76
 
77
+ Generated on: 2025-09-19 14:38:50