Sabina Chen

Projects

October 17, 2018

Overview

Assignment 6 - Reinforcement Learning with OpenAI Gym (Instructions)

FrozenLake

The reward scheme for FrozenLake is 1 for reaching the goal, and zero otherwise.
The table used for Q-learning is 16x4, because there are 16 possible tiles/states and at each state there are four possible moves (up, down, left, right).
Yes the frequency of 1 rewards over 0 rewards increased over time because at each episode. At each iteration, the agent chooses the next state based on the best value from the Q-table. Also at each iteration, the Q-table updates based on the rewards and learning rate of the subsequent action taken. The Q-table wants to maximize the rewards. Therefore, as time increases, the number of 1s increase as the agent gradually "learns" how to navigate the board by continuously updating its Q-table.
For smaller learning rates (ie. lr <= 0.5), the rewards do not improve as quickly over time, as there are still a lot of random 0s and 1s interspersed at the end, whereas higher learning rates (ie. lr >= 0.7) enable the agent to learn at a faster rate, thereby having more 1s thans 0s at the end of the training.