Importance of learning rate when running Deep cross-entropy method

Today I have been trying to re-run my code and confirm I am getting 195 points while playing CartPole. According to OpenAI website CartPole-v0 defines “solving” as getting average reward of 195.0 over 100 consecutive trials.

It was actually a very challenging task to figure out why my model is not training at all and I am getting 50% probabilities over multiple iterations.

Initially I suspected having issues in my solver (and I actually had an issue), but in the end, I realized that the learning rate is too small.

I was using Adam optimiser with a default learning rate of 0.001. It is totally OK to use it for most of the tasks. I suspect I had to increase the learning rate to 0.01 because of a relatively small batch.

Anyways, the problem is solved and cart pole is running great!

Importance of learning rate when running Deep cross-entropy method

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

w

Connecting to %s