OpenAI Lunar Lander - Solving with Vanilla DQN (aka Reinforcement Learning with Experience Replay)

To understand what is all the buzz about DeepMind’s reinforcement learning papers, I decided to implement Deep Reinforcement Learning with:

Double Q-learning
Experience Replay

The neural network was then trained on the OpenAI Lunar Lander environment.

I did my best to implement the above in TensorFlow with just the paper published by DeepMind as a reference. However, I succumbed to referencing code snippets from multiple sources on the internet.

It’s Alive!!!

And a pretty good pilot, too. The artificial brain figures out how to land consistently on target after the equivalent of 52 hours of trying:

So What’s The Big Deal?

It looks like a simple game, but trust me, it’s not that simple.

Here’s why:

It simulates real-ish physics – meaning there is gravity, momentum, and friction, and the landing legs have spring in them!
The engines do not always fire consistently in the same direction; if you look closely enough, you’ll notice that the particles shoot out at randomly varying angles.
The artificial brain knows nothing about gravity or what the ground means. It does not even know that it’s trying to land something!
All we give the brain is a score to work on. If it gets closer to the landing target, the score goes up, and vice versa if it moves further away.
The score also goes down whenever an engine is fired; we want the brain to be eco-friendly and not unnecessarily use fuel.

In The Beginning…

It starts by just randomly choosing between:

Fire Main Engine (the one below the lander)
Fire Left Engine (to rotate clockwise and nudge it a little in the opposite direction)
Fire Right Engine (does the opposite of Left Engine)
Do nothing

Unsurprisingly, the brain pilots the lander like a pro at this stage:

It Copes with Diverse Situations

Every time the game starts, the lander gets tossed in a random direction with varying force; as a result, it learns to cope with less-than-ideal situations – like getting tossed hard to the left at the start:

Learns to Stay Alive

When it moves out of the screen or hits the ground on anything else but its legs, the score goes down by a hundred. Hence, very quickly (at just over 1 hour of trying), it learns to stay alive by just hovering and staying away from the edges:

Finding Terra Firma

At around 11 hours, it starts to figure that the ground is nice but drifts off to the right in the process (also notice that its piloting skills are a little shaky at this point):

One More Time

Hence, here is the result once again. I hope I have given you insights so that you can appreciate it more this time.