Playing CartPole using Julia and MXNet implementation of DQN

Finally! I did it! I’ve been struggling for some time trying to make DQN work and could not succeed.

Today I have managed to make it work and solve CartPole from OpenAI gym using DQN. You know what the problem was? Size of my neural network!

Continue reading “Playing CartPole using Julia and MXNet implementation of DQN”

Playing CartPole using Julia and MXNet implementation of DQN

Difference in Q-learning, Sarsa, Expected Value Sarsa for dummies

I have finally implemented three different algortihms which are based on a Q-value. They all use state-action-value table and e-greedy policy when choosing the next action. The only thing that differs is the update process of the Q-value.

I have prepared a simple mapping, for those who are experiencing problems understanding the differences:

  • Q-learning: use the maximum value of next state actions as a `next_value`.
  • SARSA: generate another action using existing policy and use its value from a mapping table
  • Expected Value SARSA: use sum of all next state actions divided by a number of actions as a `next_value`.
Difference in Q-learning, Sarsa, Expected Value Sarsa for dummies

Julia: Solving Open AI Taxi-v2 using SARSA algorithm

So, I’ve been very active this two days and managed to implement SARSA algorithm for solving Taxi-v2.

They look mostly the same except that in Q-learning, we update our Q-function by assuming we are taking action `a` that maximises our post-state Q function.

In SARSA, we use the same policy that generated the previous action a to generate the next action, `a-prim`, which we run through our Q-function for updates.

It all might sound very complicated but it results in very small change to Q-learning algorithm. You can compare my implementation of SARSA and Q-learning to see the difference.

I have managed to reach the same score as with Q-learning in about the same time. I guess Taxi-v2 problem is solved once an forever.

You should also be able to launch Taxi-v2 using the code from my GitHub repository.

Julia: Solving Open AI Taxi-v2 using SARSA algorithm

Julia: Q-learning and epsilon discount factor

Ohhh! It has been so long since I wrote my last post on Q-learning and I wouldn’t say I have progressed much since then. I have managed to learn SARSA, but our topic for today will stay Q-learning and epsilon discount factor which I missed last time.

Yes, so first of all, lets clear a few things. Last time I’ve been trying to reach 9.5 in Taxi-v2, which is wrong. The game is considered succeeded if you receive an average of 8.5 during 100 games. Unfortunately, OpenAI did not document it on their website.

Continue reading “Julia: Q-learning and epsilon discount factor”

Julia: Q-learning and epsilon discount factor