Today I have been trying to re-run my code and confirm I am getting 195 points while playing CartPole. According to OpenAI website CartPole-v0 defines “solving” as getting average reward of 195.0 over 100 consecutive trials.
It was actually a very challenging task to figure out why my model is not training at all and I am getting 50% probabilities over multiple iterations.
Initially I suspected having issues in my solver (and I actually had an issue), but in the end, I realized that the learning rate is too small.
I was using Adam optimiser with a default learning rate of 0.001. It is totally OK to use it for most of the tasks. I suspect I had to increase the learning rate to 0.01 because of a relatively small batch.
Anyways, the problem is solved and cart pole is running great!