AMA with Nurture.AI Research Geeks (week of 22nd January 2018)


These are the 5 papers that we will be covering in the AMA on 5am-10am Friday 26th of January 2018 Coordinated Universal Time (UTC).

  1. A Multi-Agent Reinforcement Learning Model of Common-Pool Resource Appropriation
  2. Data Augmentation by Pairing Samples for Image Classification
  3. Generating Adversarial Examples with Adversarial Networks
  4. Less is More: Culling the Training Set to Improve Robustness of Deep Neural Networks
  5. eCommerceGAN: A Generative Adversarial Network for E-commerce

For the ground rules during the AMA - please refer to this post.

Status: Closed

Post AMA Edit: That’s it folks, the AMA session has come to a close. We will not be answering any more questions. Drop by again next Friday for another round of AMA’s, on 5 new papers that the Research Geeks have read. We’ll be releasing our list again around Monday next week so stay posted!

Till next week!


Hi @jameslee and @RouEnLee, thanks for organizing this. I really appreciate it.

I have a question on the second paper, Data Augmentation by Pairing Samples for Image Classification.

On page 4, item number 4) How do we decide when’s a good time to disable the Sample Pairing? The paper mentions that they turn it off after the the accuracy becomes mostly stable, but I’m not sure what this means.


Hi @NurulAisharozman!

Good question, if you look at the upper graph in Figure 2 on page 5 and the graph in the appendix, you can see that the authors disabled Sample Pairing after 800 epochs or so. You can use that as a rough guide to how you setup your own implementation.

As to what a stable accuracy means - it should mean that there isn’t a large variance in the accuracy of the crests and troughs on the validation set error rate when Sample Pairing is turned on intermittently. A simple way to check if the accuracy has stabilizes is to compare the crests and troughs to the previous ones. If the changes are under a certain threshold (e.g. +/- 0.02) then you can assume the accuracy to be stable. (


Hi, may I know what is a distilled model in the context of black box attacks? Also, I don’t have much background in mathematics, and you please give me an intuitive explanation of equation (5)?

@jameslee @RouEnLee


This is regarding the paper Generating Adversarial Examples with Adversarial Networks btw!


oisee, thanks for the reply @RouEnLee

What about the choice between enabling intermittent sample pairing for 8 epochs and disabling it for 2 epochs versus enabling it for 300k images and disabling it for 100k images? Does that have a significant effect on the final accuracy?


Hello @tarrz20jen! Thanks for joining us here on the AMA. :+1:

To answer your question - We would like to transfer the knowledge of the black box model to a simpler, smaller model called the distilled model. This distilled model’s output must be similar to the black box model’s output.

There are many possible forms for distilled model f. Equation (5) essentially says, give me the particular form of f when the cross entropy loss is the lowest. The cross entropy loss compares the distributions of output from our distilled and black box model, and becomes smaller when outputs from both models are similar.

Note that the weights of the black box model do not appear in any of the equations. After all, a black box model is one where the “internal workings” (i.e. parameters and weights) of the model are not known.

Hope that answers your question!


I think the authors of the paper were just testing different methods to see if they gave different results. I highly recommend that you test both methods out and see which of the 2 methods give you the best validation loss. :slightly_smiling_face:


Q: When is a good time to disable Sample Pairing? (in Paper 2: Data Augmentation by Pairing Samples…)
Good time to disable Sample Pairing completely is when the error rate falls below baseline and fluctuates less violent. Note: Sample Pairing is disabled intermittently: 8 epoch enable, 2 epoch disable in the process.
Eg. Training error on page 6.
Eg. Validation error on page 5 shows error rate falls below baseline and Sample Pairing disabled completely.

As @RouEnLee points out, about 800 epochs using CIFAR-10 dataset.
More than 2300 x100k trained images for ILSVRC dataset before Sample Pairing was disabled completely then fine tuning.

Take into consideration the types of neural network architecture and data sizes applied.


Ah okay, how do I find the baseline error rate?

I see that the graphs for the ilsvrc dataset doesn’t go by epochs. Instead they use # of trained images. :confused:


Applying Sample Pairing on CIFAR-10 show more than 2x higher training error rate
and on ILSVRC show more than 10x higher.

Keep in mind they type of neural network architecture and data type and size applied.


You will find it on the blue line under ‘fine tuning’ in this images.

Yes, you are right, they measured by trained images. I stand corrected.


@NurulAisharozman @KCKhoo

Bear in mind that the higher training loss is expected since you are intentionally confusing the model with the SamplePaired image. The result of this, however, is a lower validation loss which leads to a better accuracy.


Hi sorry for the late

This is regards paper eCommerceGAN; I have question to ask, under order reppresentation, I understand we do word embeddings for products. i not too sure about the word2vec reppresentation, and IDF; what do they mean. is there research that talk about them further?

Sorry I am still beginner in deep learning, Many thanks.


Hi @jianmengtan

Word2vec basically represents words in meaningful forms of vectors. This way, we can easily quantify how similar (or different) two words are with each other by comparing their corresponding vectors. Inverse Document Frequency (IDF) gives an intuitive sense of how much information a word contains. To give you an example, words “and”, “the" and “it” have very low information content, and they appear many times across all documents. IDF captures this mathematically by counting the number of times a word appears across different documents.

You can check out the wikipedia page for TF-IDF, an extension of IDF.

Additionally, one good paper on word2vec can be found here:

No worries, the whole purpose of AI6 platform is for everyone and anyone to learn about AI. Beginner questions are welcome! :slightly_smiling_face:


thanks @jameslee for you helpful answer!