ch-bz commited on
Commit
8f72837
·
verified ·
1 Parent(s): 0931907

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +49 -32
README.md CHANGED
@@ -1,32 +1,49 @@
1
- ---
2
- tags:
3
- - Taxi-v3
4
- - q-learning
5
- - reinforcement-learning
6
- - custom-implementation
7
- model-index:
8
- - name: q-Taxi-v3
9
- results:
10
- - task:
11
- type: reinforcement-learning
12
- name: reinforcement-learning
13
- dataset:
14
- name: Taxi-v3
15
- type: Taxi-v3
16
- metrics:
17
- - type: mean_reward
18
- value: 7.56 +/- 2.71
19
- name: mean_reward
20
- verified: false
21
- ---
22
-
23
- # **Q-Learning** Agent playing1 **Taxi-v3**
24
- This is a trained model of a **Q-Learning** agent playing **Taxi-v3** .
25
-
26
- ## Usage
27
-
28
- model = load_from_hub(repo_id="ch-bz/q-Taxi-v3", filename="q-learning.pkl")
29
-
30
- # Don't forget to check if you need to add additional attributes (is_slippery=False etc)
31
- env = gym.make(model["env_id"])
32
-
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - Taxi-v3
4
+ - q-learning
5
+ - reinforcement-learning
6
+ - custom-implementation
7
+ model-index:
8
+ - name: q-Taxi-v3
9
+ results:
10
+ - task:
11
+ type: reinforcement-learning
12
+ name: reinforcement-learning
13
+ dataset:
14
+ name: Taxi-v3
15
+ type: Taxi-v3
16
+ metrics:
17
+ - type: mean_reward
18
+ value: 7.56 +/- 2.71
19
+ name: mean_reward
20
+ verified: false
21
+ ---
22
+
23
+ # **Q-Learning** Agent playing1 **Taxi-v3**
24
+ This is a trained model of a **Q-Learning** agent playing **Taxi-v3** .
25
+
26
+ ## Usage
27
+ ```python
28
+ import gymnasium as gym
29
+ from huggingface_sb3 import load_from_hub
30
+ import numpy as np
31
+ import pickle
32
+
33
+ # Load the model
34
+ env_name = "Taxi-v3"
35
+ model_name = "q-Taxi-v3"
36
+ model_path = load_from_hub(repo_id="ch-bz/" + model_name, filename="q-learning.pkl")
37
+ Qtable = pickle.load(open(model_path, "rb"))["qtable"]
38
+ env = gym.make("Taxi-v3", render_mode="human")
39
+ state, info = env.reset()
40
+
41
+ while True:
42
+ action = np.argmax(Qtable[state][:])
43
+ state, reward, terminated, truncated, info = env.step(action)
44
+ env.render()
45
+
46
+ if terminated or truncated:
47
+ state, info = env.reset()
48
+ ```
49
+