Effect of Optimizer Choice on MNIST Score
Git Repository Results in folder training_results. Problem Statement To determine how the choice of optimizer effects LeNet-5’s performance on the MNIST and Fashion MNIST benchmarks. LeNet-5 with SGD with a learning rate of 0.001 and momentum of 0.9 will be our baseline. Without momentum, SGD doesn’t really learn. Results The table shows the accuracy on the test set from either the 9th or 10th epoch, whichever is higher. Choices MNIST Performance Fashion MNIST Performance Notes LeNet-5 tr=0.001 11% 11% Didn’t learn. LeNet-5 tr=0.01 91.91% 77.20% Learning rate 0.01. Slow start, more epochs needed. ReLU tr=0.001 87.28% 72.69% _ ReLU tr=0.01 98.94% 90.22% Slow start MaxPool tr=0.01 94.57% 79.37% Slow start ReLU and MaxPool lr=0.001 98.11% 87.21% Trained steadily. Would benefit from more epochs. ReLU and MaxPool lr=0.01 99.07% 90.45% same as Adam ^ and ASGD lr=0.01 98.52% _ _ ^ and Rprop lr=0.01 91.95% _ _ ^ and RMSprop lr=0.001 98.95% 90.64% BEST FASHION MNIST SCORE ^ and Adadelta lr=0.001 80.37% 65.66% _ ^ and Adafactor lr=0.01 99.15% 89.87% BEST MNIST SCORE ^ and Adagrad lr=0.01 98.93% _ _ ^ and Adagrad lr=0.001 96.07% _ would likely be equal to tr=0.01 with more epochs. ^ and Adam lr=0.01 98.31% _ jumped around a lot. tr obviously too high. ^ and Adam lr=0.001 99.07% 90.35% 97.34% and 84.45% accuracy after first epoch ^ and AdamW lr=0.001 98.94% 89.48% _ ^ and Adamax lr=0.001 98.95% 89.07% _ ^ and NAdam lr=0.001 99.08% _ _ ^ and NAdam lr=0.002 99.01% _ _ ^ and RAdam lr=0.001 98.97% _ jumps around a decent amount ^ and RAdam lr=0.0001 98.23% _ needs more epochs ...