Playing CartPole with Deep cross-entropy method using Julia and MXNet

Hello and happy new year!

Recently I’ve been playing FrozenLake using Cross-entropy method implemented in Julia, but this time I have made my task more complicated and implemented deep cross-entropy method using MXNet in order to play CartPole.

You will ask, what’s the difference?

First of all, if FrozenLake or Taxi, or any similar game had state described as a single integer value but in CartPole it is described with 4 float values. It makes it impossible to create a mapping between states and actions and therefore we require more sophisticated approach.

This is were neural networks come into place. I had to options – code everything myself or use one of the deep learning frameworks. For this specific task I have decided to use MXNet and the following setup:

  • Multi-class classification problem. CartPole has only 2 actions but other games might have more.
  • Simple MLP type network with 24 and 48 neurons in hidden layers. Number of neurons and layers can be supplied as an input parameter in the launch file.
  • ReLU activation function.
  • Uniform initialization on start.

The code itself inherits a lot of the cross-entropy code. The major change is support of state of any size and management of the neural network.

Another interesting thing about my implementation is the following. We don’t really need GPUs for this task but I succeeded running this MXNet on GPUs and speeding the training time twice.

Practical RL: https://github.com/dmitrijsc/practical-rl

Playing CartPole with Deep cross-entropy method using Julia and MXNet

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s