Julia: Q-learning – using value table to solve Taxi-v2

Hello hello! I am currently working on implementing Q-learning to solve Taxi-v2 in Julia.

In it’s simplest implementation, Q-Learning is a table of values for every state (row) and action (column) possible in the environment. 

Taxi-v2 rules: there are 4 locations (labeled by different letters) and your job is to pick up the passenger at one location and drop him off in another. You receive +20 points for a successful dropoff, and lose 1 point for every timestep it takes. There is also a 10 point penalty for illegal pick-up and drop-off actions.

Unfortunately, my code did not reach 9.7 in 100 consecutive rounds and stopped on around 6.9~7.1. I have compared it with other examples and it seems that my epsilon implementation is different from those, that converged with a higher score.

I have a fixed epsilon over all rounds which could be an issue. It could be that I need to have a higher value when the game starts and lower it over time.

Will test it out later. As always, you can find my code here: https://github.com/dmitrijsc/practical-rl

P. S. When I reduced epsilon to 0.01 I could reach 8.9! I guess I need to play more with the other parameters.

Julia: Q-learning – using value table to solve Taxi-v2

Importance of learning rate when running Deep cross-entropy method

Today I have been trying to re-run my code and confirm I am getting 195 points while playing CartPole. According to OpenAI website CartPole-v0 defines “solving” as getting average reward of 195.0 over 100 consecutive trials.

It was actually a very challenging task to figure out why my model is not training at all and I am getting 50% probabilities over multiple iterations.

Initially I suspected having issues in my solver (and I actually had an issue), but in the end, I realized that the learning rate is too small.

I was using Adam optimiser with a default learning rate of 0.001. It is totally OK to use it for most of the tasks. I suspect I had to increase the learning rate to 0.01 because of a relatively small batch.

Anyways, the problem is solved and cart pole is running great!

Importance of learning rate when running Deep cross-entropy method

Playing CartPole with Deep cross-entropy method using Julia and MXNet

Hello and happy new year!

Recently I’ve been playing FrozenLake using Cross-entropy method implemented in Julia, but this time I have made my task more complicated and implemented deep cross-entropy method using MXNet in order to play CartPole.

Continue reading “Playing CartPole with Deep cross-entropy method using Julia and MXNet”

Playing CartPole with Deep cross-entropy method using Julia and MXNet