Note: This post is now a wiki, so feel free to add in anything including answering the questions in the last section.
Morning Session 1: Knowledge Sharing Session
@Zaim shared with us his CNN implementation with a superman and birds dataset. You can find his presentation slides here.
Morning Session 2: Planet: Understanding the Amazon from Space, Kaggle Implementation
We did Kaggle Implementations and got everyone on the leaderboards! You can find Han Chong’s example here.
Afternoon Session 1: Fastai v2 Lesson 4: Theory and discussion
Lecture notes here, including video time line and supplementary materials.
What is dropout? Why is it used on training sets but not on validation sets?
Randomly select half of the activations and delete them. To reduce overfitting.
What are categorical and continuous data columns? Examples?
Data comes in a number of different types, which determine what kinds of mapping can be used for them. The most basic distinction is that between continuous (or quantitative) and categorical data, which has a profound impact on the types of visualizations that can be used.
The main distinction is quite simple, but it has a lot of important consequences. Quantitative data is data where the values can change continuously, and you cannot count the number of different values. Examples include weight, price, profits, counts, etc. Basically, anything you can measure or count is quantitative.
Categorical data, in contrast, is for those aspects of your data where you make a distinction between different groups, and where you typically can list a small number of categories. This includes product type, gender, age group, etc.
Example Datasets: https://www.kaggle.com/residentmario/iowa-liquor-sales/data
What are embeddings?
Getting the weight of categorical data columns to be embedded in the continuous data input.
What does tokenization do? what is it’s significance on NLP?
Tokenization is splitting text into minimal meaningful units so that it can be processed.
What does min_freq refer to and why is it used?
The minimum frequency of the occurrence of a word to be converted into an integer (index). if the frequency of word occurs less than the amount stated, then just classify it as unknown.
Afternoon Session 2: Fastai Lesson 4 Implementation
Desmon walked us through the implementation of Fastai lesson 4.
Additional Questions asked in the Feedback Forms:
Can AI make a summary out of a long report?
Yes. You can use a pretrained word vector like word2vec or glove as your embedding, process it through an encoder decoder seq 2 seq architecture while adding an attention mechanism to prioritize which data to store.
Related: there is a research paper that shows how NLP is used to extract core sentences from research papers. The technique used is not deep learning but an NLP library called NLTK. (https://nurture.ai/p/fc22e59c-af4a-41b9-b49b-7ebfb59419e9)
I still can’t see how the sequence of words in a sentence affect the predicted next word, i think i’m missing a part where the statistical data extracted from the training data is used in the neural network.
Think this way- if I have a sentence “A dog …”, some words have a higher chance of following the sentence, since the word ‘dog’ is there, you might expect ‘bark’ instead of ‘meow’ to follow next. Look up n-gram models to get a more statistical intuition of this.
Which is more computationally expensive to do? Computer vision or NLP?
It’s hard to say, since both CV and NLP are pretty broad fields with different types of implementations. But if we compare the most basic architectures for each (CNNs for CV and RNNs for NLP) then RNNs are computationally more expensive.
You use RNN with sequential data, so that the RNN tries to get the sequence tokens dependencies which will help in the classification.
Training RNNs needs a considerable size of data and several iterations to be trained efficiently. As RNNs scale in complexities(Vanilla or LSTM), more data will be needed for training.
I googled some links that talks about how the embedding actually can help the machine to understand the semantics : as shown in this link https://www.tensorflow.org/versions/master/programmers_guide/embedding. From what I gather, the latent feature for each of the word is what I’m supposed to end up with in the vector after the machine has learnt the model. For example, blue, yellow and red would have a close latent feature that somehow indicates they are colors. Is this accurate?
When to use embedding and how to properly use it?
Is the way of NLP flow of neural network similar to Image Recognition?
Firstly, Image Recognition is an AI task under a branch of AI called Computer Vision, just like NLP. Analogous tasks to Image Recognition under NLP would be document/sentence classification tasks.
Neural network flows for (most?) Image Recognition architectures are :
(raw input)Image in the form of pixels -> (data modelling)CNNs -> (classifiers for prediction)Linear layers
Meanwhile, for simple text classifcation tasks, eg : given two documents we want to predict their genre.
The flow could be :
(raw input)Free form text -> (data modelling) transform input using encoding of choice, could be embedding(NN), one hot encoding(non-NN), etc -> (classifiers for prediction) Linear layers
I guess in terms of the “flow” for analogous classification tasks for different branches, its the same. However the underlying architecture to model the data will need to be different due to the nature of the data itself. Bear in mind that using a architecture-style that is state-of-the-art in domain A might not result in state-of-the-art performances in domain B.
From my understanding we usually use embedding when the data is categorical or using text. Is there any other data that need to use embedding?
Yes, embeddings are often associated with “latent features” or abstractions of a data set. As you will see in the upcoming fastai lesson (lesson 5), there will be some use of embeddings in a Machine Learning algorithm known as Collaborative Filtering.
How to define which data are categorical or continuous?
Categorical data have no relationships between one point to the other. There is no sense of, A > B > C. Every label shares the same weight. As such, they have to be treated equally. For example, types of building or units of currency.
In contrast, continuous values, typically numeric, has those relationship. Thus you can actually do statistical analysis on them to figure out the mean, max, median or perform operations such as data imputation, outlier removal etc. Examples include timestamps, temperature, amount of money etc.