Julia: Q-learning – using value table to solve Taxi-v2

Hello hello! I am currently working on implementing Q-learning to solve Taxi-v2 in Julia.

In it’s simplest implementation, Q-Learning is a table of values for every state (row) and action (column) possible in the environment. 

Taxi-v2 rules: there are 4 locations (labeled by different letters) and your job is to pick up the passenger at one location and drop him off in another. You receive +20 points for a successful dropoff, and lose 1 point for every timestep it takes. There is also a 10 point penalty for illegal pick-up and drop-off actions.

Unfortunately, my code did not reach 9.7 in 100 consecutive rounds and stopped on around 6.9~7.1. I have compared it with other examples and it seems that my epsilon implementation is different from those, that converged with a higher score.

I have a fixed epsilon over all rounds which could be an issue. It could be that I need to have a higher value when the game starts and lower it over time.

Will test it out later. As always, you can find my code here: https://github.com/dmitrijsc/practical-rl

P. S. When I reduced epsilon to 0.01 I could reach 8.9! I guess I need to play more with the other parameters.

Julia: Q-learning – using value table to solve Taxi-v2

One thought on “Julia: Q-learning – using value table to solve Taxi-v2

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

w

Connecting to %s