AMA with Nurture.AI Research Geeks 9 Feb 2018


These are the 5 papers that we will be covering in the AMA on 5am-10am Friday 9nd of Feb 2018 Coordinated Universal Time (UTC).

Deep Reinforcement Learning for Programming Language Correction

Pretraining Deep Actor-Critic Reinforcement Learning Algorithms With Expert Demonstrations

TransRev: Modeling Reviews as Translations from Users to Items

Generating Wikipedia by Summarizing Long Sequences

Image2GIF: Generating Cinemagraphs using Recurrent Deep Q-Networks

Status: Ended


I noticed abstractive and extractive summarization being mentioned repeatedly in the paper “Generating Wikipedia by Summarizing Long Sequences2”. What is the difference between them two?

What does sequence transduction mean (see section 4.2.3)?



Extractive summarisation means taking out a portion of a text word-for-word to produce a condensed version of it.

Abstractive summarisation means producing a bottom up summary, and the the summary might be paraphrased from the original text.

Sequence transduction is the process of mapping a sentence to another sentence. For example, translating English to German.


In paper about wikipedia, what is perplexity? Do better model have lower perplexity?


Perplexity is used on the testing set to gauge the performance of a text model.

An intuition behind perplexity is that a better NLP model is one that assigns a higher probability to the word that actually occurs. Mathematically, it’s (1/P(w1,w2,…,wn))^(1/n), where n is the number of words and wi is the ith word.

A lower perplexity would mean a better model.


Hi guys!

I have a question on the paper Deep Reinforcement Learning for Programming Language Correction.

In the Expert Demonstrations section, why are the demonstrations provided at the episodic level and not a finer granularity? The paper says that the reason is because the agent needs to take the right actions throughout the episode to reach the goal and that if it only takes intermittent guidance it can still fail. Can someone explain this to me? I don’t see how it makes sense.

For example if each action would lead to a reward - shouldn’t it still be able to reach the goal, purely on maximizing the actions against rewards received regardless of whether the rewards are introduced intermittently or on an episodic level?


Hmmm, good question @KengLeeTay.

When you think about it - the goal state is to be able to compile the C code without any errors. While there are multiple ways to achieve this state (ie DeepFix’s approach), this paper aims to have the agent do that in the least amount of edits. Alternatively, you can say that the agent aims to fix errors in the best (and quickest) way possible.

If you gave rewards intermittently, it may be a best possible fix for that specific erroneous line of code. However, it is not guaranteed to be the best step towards fixing and compiling the program as a whole.


I see, so it’s because we want the agent to learn to fix problems in the best way possible and not only fix the errors.

Thanks James!