This agent was trained using the PPO2 algorithm while also using a convoluted neural network, eventually it was able to beat a level of Super Mario World. Additionally, a Discretizer was used in order to speed up training.
This agent utilized the PPO2 algorithm while using a multilayer perceptron network. instead of a scaled down RGB image, this agent used RAM values directly from the game in order to observe the game state.
This agent was trained using the PPO2 algorithm combined with a CNN, after around 20 million timesteps of training, the agent was able to beat a stage of the game. The bot was optimized only for achieving higher scores.
This agent was trained using the PPO2 algorithm while also using a multilayer perceptron network. RAM observations were used in order to increase the speed of training. After 3 million timesteps the bot was able to defeat the opponent.
Unlike the others, this agent was trained using the NEAT algorithm. Essentially an evolutionary algorithm for machine learning. Due to this, all of the training is watchable. Eventually, the bot was able to beat 3 levels of the game.