File size: 3,533 Bytes
b49af5c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
import os 
import sys
import gym
import random
import utils 
import numpy as np

from collections import deque

from keras.layers import Dense
from keras.optimizers import Adam
from keras.models import Sequential
from matplotlib import pyplot as plt 

class DQNAgent:
	def __init__(self, state_size, action_size):
		self.state_size = state_size
		self.action_size = action_size
		self.discount_factor = 0.99
		self.learning_rate = 0.001
		self.epsilon = 1.0
		self.epsilon_decay = 0.999
		self.epsilon_min = 0.01
		self.batch_size = 64
		self.train_start = 1000
		self.memory = deque(maxlen=2000)
		self.model = self.build_model()
		self.target_model = self.build_model()

		self.update_target_model()

	def build_model(self):
		model = Sequential()
		model.add(Dense(24, input_dim=self.state_size, activation='relu',
						kernel_initializer='he_uniform'))
		model.add(Dense(24, activation='relu',
						kernel_initializer='he_uniform'))
		model.add(Dense(self.action_size, activation='linear',
						kernel_initializer='he_uniform'))
		model.compile(loss='mse', optimizer=Adam(lr=self.learning_rate))
		return model

	def update_target_model(self):
		self.target_model.set_weights(self.model.get_weights())

	def get_action(self, state):
		if np.random.rand() <= self.epsilon:
			return random.randrange(self.action_size)
		else:
			q_value = self.model.predict(state)
			return np.argmax(q_value[0])

	def append_sample(self, state, action, reward, next_state, done):
		self.memory.append((state, action, reward, next_state, done))
		if self.epsilon > self.epsilon_min:
			self.epsilon *= self.epsilon_decay

	def train_model(self):
		if len(self.memory) < self.train_start:
			return
		
		batch_size = min(self.batch_size, len(self.memory))
		mini_batch = random.sample(self.memory, batch_size)

		update_input = np.zeros((batch_size, self.state_size))
		update_target = np.zeros((batch_size, self.state_size))
		action, reward, done = [], [], []

		for i in range(self.batch_size):
			update_input[i] = mini_batch[i][0]
			action.append(mini_batch[i][1])
			reward.append(mini_batch[i][2])
			update_target[i] = mini_batch[i][3]
			done.append(mini_batch[i][4])

		target = self.model.predict(update_input)
		
		target_val = self.target_model.predict(update_target)

		for i in range(self.batch_size):
			if done[i]:
				target[i][action[i]] = reward[i]
			else:
				target[i][action[i]] = reward[i] + self.discount_factor * (
					np.amax(target_val[i]))

		self.model.fit(update_input, target, batch_size=self.batch_size,
					   epochs=1, verbose=0)

def run_DQN():
    episodes = 500 
    seed = 1 
    results = []
    game = 'CartPole-v0'

    env = gym.make(game)
    
    state_size = env.observation_space.shape[0]
    action_size = env.action_space.n

    agent = DQNAgent(state_size, action_size)

    for e in range(episodes):
        done = False
        score = 0
        state = env.reset()
        state = np.reshape(state, [1, state_size])

        while not done:
            action = agent.get_action(state)
            next_state, reward, done, info = env.step(action)
            next_state = np.reshape(next_state, [1, state_size])

            agent.append_sample(state, action, reward, next_state, done)
            agent.train_model()

            score += reward
            state = next_state

            if done:
                agent.update_target_model()

        results.append(score)

    utils.save_trained_model(game, seed, 'DQN', agent.model)

    plt.plot(results)
    plt.show()

run_DQN()