Lessons from My First Neural Network and CNN Build

Goal

Do a practical mini-project to cement in the learnings about Generative AI and neural networks.

I chose the MNIST digit recogniser competition on Kaggle because it’s the “hello world” of computer vision.

Expectations

Even though the actual math may be abstracted away in code, I’ll get to “see, feel, understand” what it means to build a neural network.
As I spend more time fiddling around with the model and the code, output quality will improve.

I first asked GPT(4wo) to pick between PyTorch, Tensorflow or any other framework. Then I asked it to write pseudocode and follow up with heavily commented code. I kept the neural net trivially small to learn the essence first and then move on to larger networks.

Reality

There were no direct mathematical considerations. The most important pieces of code were data preprocessing, choice of optimiser, loss function selection, dataloader batching, and of course hyperparameter tuning. These are largely practical considerations and many of them already standardised.
The output responded counterintuitively in many places. For example, I decreased the step size by 10x and increased epochs by 50x expecting better performance. That did not happen. I believe this is counterintuitive only because I haven’t spent time working with these models so I don’t have an intuition.
PyTorch is highly efficient and my decision to use a smaller model did not affect much. It’s just 2 extra lines of code to add a layer. Increasing number of neurons is just changing params in a function call. Also, as a beginner I have zero clue what model is large and what is small. I learnt this after seeing the top submission’s CNN code (see below for details).
With larger models, it’s important to start worrying about overfitting. While this is technically mathematical, there are practical rules of thumb which appeared rather standardised. Techniques like randomly setting some neuron weights to zero (dropout), batch normalisation and so on.
A lot of the work in training neural networks is repetitive and can be abstracted out into a much simpler library. I believe fastai does something similar already.

Outcome

I got about 95.5% cross validation accuracy locally using my code and quite a few useful learnings. On the leaderboard, my ‘bignet’ got 96.14% and ‘hugenet’ got 92.76%

Bonus - hacking CNNs using GPT and public notebooks

Since this was image classification I figured I’ll also learn about CNN (convolutional neural networks). I found an open notebook on Kaggle which got 98.88% on the public leaderboard. I knew nothing about CNNs, so I gave the code to ChatGPT and said:

I am learning about CNNs via the MNIST classification competition on Kaggle.
Here's a sample code that I want to use to understand CNNs: Please break it down for me using pseudocode, comments, and whatever else is necessary

There were lots of interesting details I found out that you can read about here. Here are a few remarkable ones:

From watching videos, I thought convolutional ‘layers’ were supposed to be hidden inside a network. Here though, they were the first two layers. It turns out these ‘layers’ are mini neural networks themselves and the convolution operations are hidden inside those networks.
Convolution operation extracts visual information from the image such as edges, corners, faces of dogs etc. This information should be extracted first and then sent out to standard MLP layers. It would not make sense to use a CNN layer after MLP because the spatial relationship data would be lost.
Techniques like randomly rotating images, adding skew etc can be used to increase dataset size and make the training set more robust to overfitting.
Even training a CNN based neural network is rather easy. Training a good one is the hard part. After fiddling around with the open notebook to add cross validation, I submitted it and managed to reduce its score from 98.88% to 98.83%

Go Deep Learning!

Lessons from My First Neural Network and CNN Build

Goal

Expectations

Reality

Outcome

Bonus - hacking CNNs using GPT and public notebooks

View Next

Training a Language Model From Scratch

Learn Compounding