In the 60s, Marvin Minsky assigned a couple of undergrads to spend the summer programming a computer to use a camera to identify objects in a scene. He figured they'd have the problem solved by the end of the summer. Half a century later, we're still working on it

Warning: this assignment is out of date. It may still need to be updated for this year's class. Check with your instructor before you start working on this assignment.

This assignment is before 11:59PM due on Thursday, March 18, 2021.

Training Classifiers : Assignment 5

Part 1: Training an Image Classifier

This week we will learn how to use the annotations that you collected as a Requester on Amazon Mechanical Turk to train image classifiers. We will be using fastai, which is a wrapper around PyTorch that helps people get quickly started with deep learning. Deep learning uses multi-layer neural networks. Applications of deep learning can be found all around you, including speech recognition, autonomous driving, and board games.

The creators of Fast AI have created a bunch of very good tutorials that are a hands-on practical introduction to deep learning for coders. You will watch Lesson 1 for an introduction to deep learning, and follow along with their Python notebook.

We’ll then adapt the Fast AI example so that we use deep learning to classify wedding photos in our Colab notebook for image classification. You will write code in several places in our notebook. You’ll load in the aggregated results that we all collected from Amazon Mechanical Turk to get labeled training data. You’ll aggregate the Turkers’ labels with voting to determine whether an image represents a wedding or not. We will use these labels to train the classifier.

We’ll also try training different versions of the wedding photo classifier to see the effects of representation in data collections. The first version of our classifer will be trained only on Western weddings, and the next will be expanded to include Indian weddings as well.

Part 2: Training a Text Classifier

Text classification is one of the tasks that is addressed in natural language processing (NLP). Like with computer vision, NLP uses deep learning. A particular kind of deep learning model that is used in NLP is called the transformer. If you’re interested in learning about transformers in this blog post. We’ll be using an implementation of transformers from an open source package called Hugging Face.

For this assignment, we’ll look at wallk through a text classification task called intent detection. When you talk to your Amazon Alexa, it needs to figure out what you’re trying to do. If you say “add five mintues to my chicken timer”, what are you trying to do? Are you trying to play music? Do you want to check the weather? Are you setting a timer? Are you trying to get a recipe to cook something? Depending on what it thinks your intent is, it routes your message to a specialized module to handle your request.

Instructions

Watch Lesson 1 on image classification, following along in your own copy of the accompanying Python notebook - it’s fun, and you’ll learn more by runnning code!
Make a copy of this Google Colab notebook for image clasification, and then work through the assignment. The parts that you have to code are marked with
```
##### START CODE HERE
##### END CODE HERE 
```
Make a copy of this Google Colab notebook for text classification, and then work through the assignment. The parts that you have to code are marked with
```
## TO DO:
... 
```
Answer the following Homework 5 questions on Gradescope. There you will submit links to your Colab notebooks with all the outputs shown.

Questions

Below are the questions that you will be asked to answer about this assignment. Please turn in your answers for Homework 5 on Gradescope.

What is the link to your Colab notebook?
What is the difference between classification and regression?
What is a validation set? What is a test set? Why do we need them?
What does it mean to “normalize” images?
What is “overfitting”, and how do you try to avoid it?
Interpret the accuracy plot of the transformer model. Is the accuracy dependent on the train_size parameter? Would you say that the model is performing well overall? Why or why not?