Difference between revisions of "CartPole DQN"
| Line 14: | Line 14: | ||
==Tutorial== | ==Tutorial== | ||
| − | This section will walk you through the steps of solving the CartPole problem with a deep Q-network. This | + | This section will walk you through the steps of solving the CartPole problem with a deep Q-network. This tutorial is written for python 3. |
===Packages=== | ===Packages=== | ||
| Line 55: | Line 55: | ||
<li><code>explore_stop</code> is the lowest the exploration probability can ever get.</li> | <li><code>explore_stop</code> is the lowest the exploration probability can ever get.</li> | ||
<li><code>decay_rate</code> is the rate at which exploration probability decays.</li> | <li><code>decay_rate</code> is the rate at which exploration probability decays.</li> | ||
| + | </ul> | ||
| + | Now for the next step, we complete our constructor by saving all of these parameters as instance variables, defining a neural network model, and defining a few other parameters. | ||
| + | <PRE> | ||
| + | class DQNAgent: | ||
| + | |||
| + | def __init__(self, input_dim, output_dim, learning_rate=.005, | ||
| + | mem_size=50000, batch_size=64, gamma=.99, explore_start=1.0, explore_stop=.01, decay_rate=.0005): | ||
| + | # Save class variables. | ||
| + | self.learning_rate = learning_rate | ||
| + | self.mem_size = mem_size | ||
| + | self.batch_size = batch_size | ||
| + | self.gamma = gamma | ||
| + | self.explore_start = explore_start | ||
| + | self.explore_stop = explore_stop | ||
| + | self.decay_rate = decay_rate | ||
| − | </ | + | # Define other instance variables. |
| + | self.explore_p = explore_start # The current probability of taking a random action. | ||
| + | self.step = 0 # The number of actions taken by our agent so far. Used to calculate explore_p decay. | ||
| + | |||
| + | # Define and compile our DQN. | ||
| + | input_layer = Input(shape=(input_dim,)) | ||
| + | hl = Dense(24, activation="relu")(input_layer) | ||
| + | hl = Dense(24, activation="relu")(hl) | ||
| + | hl = Dense(24, activation="relu")(hl) | ||
| + | output_layer = Dense(output_dim, activation="linear")(hl) | ||
| + | self.model = Model(input_layer, output_layer) | ||
| + | self.model.compile(loss="mse", optimizer=RMSprop(lr=learning_rate)) | ||
| + | |||
| + | |||
| + | </PRE> | ||
Revision as of 14:27, 17 February 2018
Contents
Overview
This tutorial will show you how to solve the popular CartPole problem using deep Q-learning. The CartPole problem is as follows:
A pole is attached by an un-actuated joint to a cart, which moves along a frictionless track. The system is controlled by applying a force of +1 or -1 to the cart. The pendulum starts upright, and the goal is to prevent it from falling over. A reward of +1 is provided for every timestep that the pole remains upright. The episode ends when the pole is more than 15 degrees from vertical, or the cart moves more than 2.4 units from the center.
Tutorial
This section will walk you through the steps of solving the CartPole problem with a deep Q-network. This tutorial is written for python 3.
Packages
You must first pip install the following packages: gym keras and
numpy
DQN Agent
The first step of our implementation will be creating a DQNAgent object. This object will manage the state of our learning, and is independent of the CartPole problem. It has all the generic parts of a Q-learning agent and can be reused for other deep Q-learning applications.
Start by creating a file DQNAgent.py and include the following imports:
from keras.layers import Input, Dense from keras.optimizers import RMSprop from keras.models import Model from collections import deque
The reason for each import will become apparent as our implementation continues. Next add a blank DQNAgent class with an empty constructor.
class DQNAgent:
def __init__(self):
pass
This class will take in all of our hyperparemeters, so let's update our constructor to take in those parameters. We also provide some default values for some of those hyperparameters.
class DQNAgent:
def __init__(self, input_dim, output_dim, learning_rate=.005,
mem_size=50000, batch_size=64, gamma=.99, explore_start=1.0, explore_stop=.01, decay_rate=.0005):
pass
input_dimis the number of input nodes for our DQN.output_dimis the number of output nodes for our DQN.learning_rateis a Keras parameter for our network describing how much we value new information.mem_sizeis the maximum number of instances allowed in our bucket for experience replay.batch_sizeis the number of experience tuples we train our model on each replay event.gammais our discount factor for the Bellman equation update.explore_startis the initial exploration probability.explore_stopis the lowest the exploration probability can ever get.decay_rateis the rate at which exploration probability decays.
Now for the next step, we complete our constructor by saving all of these parameters as instance variables, defining a neural network model, and defining a few other parameters.
class DQNAgent:
def __init__(self, input_dim, output_dim, learning_rate=.005,
mem_size=50000, batch_size=64, gamma=.99, explore_start=1.0, explore_stop=.01, decay_rate=.0005):
# Save class variables.
self.learning_rate = learning_rate
self.mem_size = mem_size
self.batch_size = batch_size
self.gamma = gamma
self.explore_start = explore_start
self.explore_stop = explore_stop
self.decay_rate = decay_rate
# Define other instance variables.
self.explore_p = explore_start # The current probability of taking a random action.
self.step = 0 # The number of actions taken by our agent so far. Used to calculate explore_p decay.
# Define and compile our DQN.
input_layer = Input(shape=(input_dim,))
hl = Dense(24, activation="relu")(input_layer)
hl = Dense(24, activation="relu")(hl)
hl = Dense(24, activation="relu")(hl)
output_layer = Dense(output_dim, activation="linear")(hl)
self.model = Model(input_layer, output_layer)
self.model.compile(loss="mse", optimizer=RMSprop(lr=learning_rate))
