DeepMind, a Google-owned London-based company founded by Theme Park programmer Demis Hassabis, let its AI play a variety of Atari 2600 games with no instructions in 2015. The AI managed to learn how to play 49 games on its own using "reinforcement learning,” but struggled with Montezuma's Revenge — a single-player game first released in 1984.
Now, a little over a year later, the DeepMind team has found a way to make the AI want to win the game by programming artificial curiosity reinforced with rewards for exploring the game. Montezuma's Revenge has three levels, each with 24 rooms arranged in a pyramid shape with their own unique challenges. For example, to escape the first room, the player has to climb ladders, avoid a creature, pick up a key and then open doors.
Previously, the AI did not have motivation to play and merely explored two rooms. With the help of artificial curiosity, the AI successfully explored 15 of the 24 rooms in the game and managed to beat the first room in just four tries.
The DeepMind team published their findings in a study, which concludes that a reward system improved performance in more challenging games.
“Drawing inspiration from the intrinsic motivation literature, we use sequential density models to measure uncertainty, and propose a novel algorithm for deriving a pseudo-count from an arbitrary sequential density model,” reads the abstract of the study. “This technique enables us to generalize count-based exploration algorithms to the non-tabular case. We apply our ideas to Atari 2600 games, providing sensible pseudo-counts from raw pixels. We transform these pseudo-counts into intrinsic rewards and obtain significantly improved exploration in a number of hard games, including the infamously difficult Montezuma's Revenge.”