Julia: Q-learning – using value table to solve Taxi-v2

Hello hello! I am currently working on implementing Q-learning to solve Taxi-v2 in Julia.

In it’s simplest implementation, Q-Learning is a table of values for every state (row) and action (column) possible in the environment. 

Taxi-v2 rules: there are 4 locations (labeled by different letters) and your job is to pick up the passenger at one location and drop him off in another. You receive +20 points for a successful dropoff, and lose 1 point for every timestep it takes. There is also a 10 point penalty for illegal pick-up and drop-off actions.

Unfortunately, my code did not reach 9.7 in 100 consecutive rounds and stopped on around 6.9~7.1. I have compared it with other examples and it seems that my epsilon implementation is different from those, that converged with a higher score.

I have a fixed epsilon over all rounds which could be an issue. It could be that I need to have a higher value when the game starts and lower it over time.

Will test it out later. As always, you can find my code here: https://github.com/dmitrijsc/practical-rl

P. S. When I reduced epsilon to 0.01 I could reach 8.9! I guess I need to play more with the other parameters.

Julia: Q-learning – using value table to solve Taxi-v2

Importance of learning rate when running Deep cross-entropy method

Today I have been trying to re-run my code and confirm I am getting 195 points while playing CartPole. According to OpenAI website CartPole-v0 defines “solving” as getting average reward of 195.0 over 100 consecutive trials.

It was actually a very challenging task to figure out why my model is not training at all and I am getting 50% probabilities over multiple iterations.

Initially I suspected having issues in my solver (and I actually had an issue), but in the end, I realized that the learning rate is too small.

I was using Adam optimiser with a default learning rate of 0.001. It is totally OK to use it for most of the tasks. I suspect I had to increase the learning rate to 0.01 because of a relatively small batch.

Anyways, the problem is solved and cart pole is running great!

Importance of learning rate when running Deep cross-entropy method

Playing CartPole with Deep cross-entropy method using Julia and MXNet

Hello and happy new year!

Recently I’ve been playing FrozenLake using Cross-entropy method implemented in Julia, but this time I have made my task more complicated and implemented deep cross-entropy method using MXNet in order to play CartPole.

Continue reading “Playing CartPole with Deep cross-entropy method using Julia and MXNet”

Playing CartPole with Deep cross-entropy method using Julia and MXNet

Difference between evolutionary methods and methods that learn value functions

In parallel to taking Practical RL course, I am also reading a great book on reinforcement learning. I have found this quote to be good to note it down on the blog. It is explaining the difference between evolutionary methods and methods that learn value functions.

For example, if the player wins, then all of its behavior in the game is given credit, independently of how specific moves might have been critical to the win. Credit is even given to moves that never occurred! Value function methods, in contrast, allow individual states to be evaluated. In the end, evolutionary and value function methods both search the space of policies, but learning a value function takes advantage of information available during the course of play.

— Richard S. Sutton and Andrew G. Barto. Reinforcement Learning. An Introduction

Switching from Reinforce.jl to POMDPs.jl

A week ago I have realized that Reinforce.jl package is not maintained anymore and thought about switching it to another one.

I have spent some time rewriting my code using POMDP.jl framework/ approach and moved Reinforce.jl code under deprecated.

If you browse my code you will find OpenAI text toys implementation which will be used as a base for all future developments.

Github: https://github.com/dmitrijsc/practical-rl

Switching from Reinforce.jl to POMDPs.jl

Playing Frozenlake with genetic algorithms

Our topic for today will be using Random Policy and enhance it with genetic/ evolutionary algorithms to score in different versions of FrozenLake.

About FrozenLake, OpenAI gym:
The agent controls the movement of a character in a grid world. Some tiles of the grid are walkable, and others lead to the agent falling into the water. Additionally, the movement direction of the agent is uncertain and only partially depends on the chosen direction. The agent is rewarded for finding a walkable path to a goal tile.

Let’s get started!
Continue reading “Playing Frozenlake with genetic algorithms”

Playing Frozenlake with genetic algorithms