Playing FrozenLake using cross-entropy method

Hello, hello!

It has been a long time since I actually developed anything useful, but today I have finally managed to complete cross-entropy method development for playing OpenAI text games using Julia.


Cross-entropy method (CEM) in nutshell

So how do we solve the policy optimisation problem of maximising the total reward given some parametrised policy? Here at any point in time, you maintain a distribution over parameter vectors and move the distribution towards parameters with a higher reward. This works surprisingly well, even if its not that effective when theta is a high dimensional vector.

Algorithm

The idea is to initialise the mean and sigma of a Gaussian and then for ​epochs times we:

  • collect a batch of n_samplesof theta from a Gaussian with the current mean and sigma
  • perform a noisy evaluation to get the total rewards with these thetas
  • select best performing combinations of states and actions
  • calculate a number of times an action was taken in a specific state and convert it to a probability

So in order to complete the task I had implement or reuse the following functionality:

  • ToyTextMDP – it is an implementation of the Text Toy problem from OpenAI
  • Cross-entropy Policy – it is a mapping from every state that an agent might take to an action
  • Cross-entropy Policy Solver – it is a function that responsible for minimising the error/ maximising the Q value

The model and implementation are extremely simple. You can find an even more detailed explanation on Wikipedia.

You can find my source code on GitHub. I execute week_1.jl to run the CEM.

Playing FrozenLake using cross-entropy method

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s