Q-Learning Agent playing Blackjack-v1
This is a trained model of a Q-Learning agent playing Blackjack-v1. The agent was trained for 500000 episodes.
Evaluation Results
- Mean Reward: -0.15 +/- 0.94
Usage
import gymnasium as gym
import pickle
from huggingface_hub import hf_hub_download
# Define state_mapper if needed for your environment (e.g., Blackjack)
# def map_blackjack_state_to_int(state_tuple, obs_space): ...
def load_from_hub(repo_id, filename):
pickle_model = hf_hub_download(repo_id=repo_id, filename=filename)
with open(pickle_model, 'rb') as f:
downloaded_model_file = pickle.load(f)
return downloaded_model_file
model_data = load_from_hub(repo_id="shihuai7189/q-Blackjack-v1-experiment", filename="q-learning.pkl")
q_table = model_data["qtable"]
env_id = model_data["env_id"]
# For Blackjack, the observation space is needed for the mapper
# if env_id == "Blackjack-v1":
# env_temp = gym.make(env_id) # Create a temp env to get obs_space details
# blackjack_obs_space_dims = (env_temp.observation_space[0].n, env_temp.observation_space[1].n, env_temp.observation_space[2].n)
# env_temp.close()
# # Then use map_blackjack_state_to_int(raw_state, blackjack_obs_space_dims)
# Example of running the loaded agent (adapt as needed)
# env = gym.make(env_id)
# raw_state, info = env.reset()
# state_idx = map_blackjack_state_to_int(raw_state, blackjack_obs_space_dims) if env_id == "Blackjack-v1" else raw_state
# ... run agent using greedy_policy(q_table, state_idx) ...
Evaluation results
- mean_reward on Blackjack-v1self-reported-0.15 +/- 0.94