To understand what is all the buzz about DeepMind’s reinforcement learning papers, I decided to implement Deep Reinforcement Learning with:
The neural network was then trained on the OpenAI Lunar Lander environment.
I did my best to implement the above in TensorFlow with just the paper published by DeepMind as a reference. However, I succumbed to referencing code snippets from multiple sources on the internet.
And a pretty good pilot, too. The artificial brain figures out how to land consistently on target after the equivalent of 52 hours of trying:
It looks like a simple game, but trust me, it’s not that simple.
Here’s why:
It starts by just randomly choosing between:
Unsurprisingly, the brain pilots the lander like a pro at this stage:
Every time the game starts, the lander gets tossed in a random direction with varying force; as a result, it learns to cope with less-than-ideal situations – like getting tossed hard to the left at the start:
When it moves out of the screen or hits the ground on anything else but its legs, the score goes down by a hundred. Hence, very quickly (at just over 1 hour of trying), it learns to stay alive by just hovering and staying away from the edges:
At around 11 hours, it starts to figure that the ground is nice but drifts off to the right in the process (also notice that its piloting skills are a little shaky at this point):
Hence, here is the result once again. I hope I have given you insights so that you can appreciate it more this time.