Playing Frozenlake with genetic algorithms

Our topic for today will be using Random Policy and enhance it with genetic/ evolutionary algorithms to score in different versions of FrozenLake.

About FrozenLake, OpenAI gym:
The agent controls the movement of a character in a grid world. Some tiles of the grid are walkable, and others lead to the agent falling into the water. Additionally, the movement direction of the agent is uncertain and only partially depends on the chosen direction. The agent is rewarded for finding a walkable path to a goal tile.

Let’s get started!

I guess I should start by letting you know that I will try to use Julia for everything I develop. In the first few posts I will also include the full code so we can get started and most probably will link here in the future.

Let’s start by simply loading the packages and loading the environment:

using OpenAIGym
import Reinforce.action

env = GymEnv("FrozenLake8x8-v0") # FrozenLake-v0 for 4x4
env_actions = env.pyenv[:action_space][:n]
env_space = env.pyenv[:observation_space][:n]
env_space_actions = 0:(env_space - 1)

It is important to note that we are trying to generalize our code. That’s the reason behind finding `env_action`, `env_space` and creating `env_space_actions` – a set of actions in the game.

Now let’s get to defining our Policy.

struct StateActionPolicy <: AbstractPolicy
    states::Vector{Int}
end

function action(policy::StateActionPolicy, r, s′, A′)
    index = (typeof(s′) != Int64 ? 0 : s′) + 1
    policy.states[index]
end

So what did I do above? I have defined a new policy, that will keep state to action mapping and accept it as a constructor. Next, I have defined actions function, that select mapping value for the current state.

So now we have an environment and policy. Let’s generate a random number of state mappings and evaluate them.

play_episode_number = 10^3
rewards = zeros(play_episode_number)
random_policy_values = rand(env_space_actions, env_space, play_episode_number)

function global_episode(env, policy)

    reward = run_episode(env, policy) do
        # Nothing here
    end

    return reward

end

function play_global_policy(env, policy_id = 1, episode_count = 100)
    return sum(map(x -> global_episode(env, StateActionPolicy(random_policy_values[:, policy_id])), zeros(episode_count)))
end

policy_scores = map(x -> play_global_policy(env, x[1]), enumerate(rewards));

A few things we did above:

  1. We generated 10^3 random policies
  2. We evaluated each policy 100 times and collected the results in `policy_scores`

Now we proceed with genetic algorithms. In short, we do the following:

  1. Crossover: take 2 random policies and select actions from one or another
  2. Mutation: take random policy and update its steps randomly
  3. Evaluate the results and keep best policies
  4. Repeat
EPOCHS = 20
IMPUTATIONS = 50
KEEP_RECORDS = 100

for i = 1:EPOCHS

    CURRENT_SIZE = length(policy_scores)

    for j = 1:IMPUTATIONS

        policy_selector = rand(1:CURRENT_SIZE, 3, 1)

        random_policy_values = hcat(random_policy_values, crossover(policy_selector[1], policy_selector[2]))
        policy_scores = vcat(policy_scores, play_global_policy(env, CURRENT_SIZE + j*2 - 1))

        random_policy_values = hcat(random_policy_values, mutate(policy_selector[3]))
        policy_scores = vcat(policy_scores, play_global_policy(env, CURRENT_SIZE + j*2))
    end

    indices = sortperm(policy_scores)

    random_policy_values = random_policy_values[:, indices][:, end-KEEP_RECORDS:end]
    policy_scores = policy_scores[indices][end-(KEEP_RECORDS - 1):end]

end

I have trained 2 different models – one for Frozenlake-v0, which has 4×4 map and one for Frozenlake8x8-v0 which has 8×8 map.

My models achieve the following results which are considered good according to the playground:

  • FrozenLake4x4: 0.86 in 10+ epochs
  • FrozenLake8x8: 0.97 in 30+ epochs

Source code – GitHub

Playing Frozenlake with genetic algorithms

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

w

Connecting to %s