Retro AI

A collection of reinforcement learning agents trained to excel at vintage arcade games

Trained Agents

Super Mario World

This agent was trained using the PPO2 algorithm while also using a convoluted neural network, eventually it was able to beat a level of Super Mario World. Additionally, a Discretizer was used in order to speed up training.

Tetris Attack

This agent utilized the PPO2 algorithm while using a multilayer perceptron network. instead of a scaled down RGB image, this agent used RAM values directly from the game in order to observe the game state.

Space Invaders

This agent was trained using the PPO2 algorithm combined with a CNN, after around 20 million timesteps of training, the agent was able to beat a stage of the game. The bot was optimized only for achieving higher scores.

Samurai Shodown

This agent was trained using the PPO2 algorithm while also using a multilayer perceptron network. RAM observations were used in order to increase the speed of training. After 3 million timesteps the bot was able to defeat the opponent.

Mortal Kombat 3

This agent was trained using the PPO2 algorithm in combination with a Convoluted neural network. Eventually the agent gained the skills necessary to defeat the first level of the game. Rewards were given to the agent when he dealt damage.

Kung Fu

This agent was also trained using the PPO2 algorithm and a convoluted neural network. The bot had the most difficulty trying to dodge enemy projectiles but did well when it came to fighting foes head on.

Airstriker Genesis

Unlike the others, this agent was trained using the NEAT algorithm. Essentially an evolutionary algorithm for machine learning. Due to this, all of the training is watchable. Eventually, the bot was able to beat 3 levels of the game.

Fighting Masters

This agent was trained using the PPO2 algorithm and a multilayer perceptron network. To speed up training, RAM values were used to directly observe the game. After 5 million timesteps, the bot was able to defeat the opponent.

Ferrari Grand Prix

This agent was trained using the PPO2 algorithm and a convoluted neural network. After only having trained for 1 million timesteps, the agent was able to complete all three laps without any significant crash.