--- license: apache-2.0 language: - en pipeline_tag: reinforcement-learning tags: - web - game - CosmicVoyage ---This model is a reinforcement learning agent trained to autonomously navigate and control the web-based game Cosmic Voyager. Utilizing the Proximal Policy Optimization (PPO) algorithm, the agent learns optimal strategies to maximize in-game performance. Training Configuration: Algorithm: Proximal Policy Optimization (PPO) Policy: Convolutional Neural Network (CnnPolicy) Learning Rate: 5e-5 Batch Size: 256 Number of Steps per Update (n_steps): 2048 Number of Epochs: 20 Maximum Gradient Norm (max_grad_norm): 0.75 Discount Factor (gamma): 0.95 GAE Lambda (gae_lambda): 0.95 Clip Range: 0.1 Entropy Coefficient (ent_coef): 0.02 Target KL Divergence (target_kl): 0.025 Total Timesteps: 3,000,000 Policy Architecture: Feature Extractor Dimensions: 1024 Network Architecture: Policy Network (pi): [1024, 512, 256] Value Function Network (vf): [1024, 512, 256] Activation Function: LeakyReLU Image Normalization: Disabled Environment Configuration: Observation Dimensions: Adjusted to fit the game's requirements Frame Stacking: Implemented to provide temporal context Usage: This model is designed to be integrated into the Cosmic Voyager game, enabling autonomous gameplay. For integration details and deployment instructions, please refer to the accompanying documentation. Training Monitoring: Training progress and metrics were tracked using Weights & Biases under the project 'Cosmic Voyager RL' by the entity 'andiB1293'. Disclaimer: This model is tailored specifically for the Cosmic Voyager game environment. Performance in different settings or games may vary. Users are advised to test the model thoroughly in their specific use cases.