AMA with Nurture.AI Research Geeks 2 Feb 2018


These are the 5 papers that we will be covering in the AMA on 5am-10am Friday 2nd of Feb 2018 Coordinated Universal Time (UTC).

  1. PointCNN
  2. Let’s Dance: Learning From Online Dance Videos
  3. PRNN: Recurrent Neural Network with Persistent Memory
  4. HappyDB: A Corpus of 100,000 Crowdsourced Happy Moments
  5. Personalizing Dialogue Agents: I have a dog, do you have pets too?

Status: Ended

Post AMA Edit: That’s it folks, the AMA session has come to a close. We will not be answering any more questions. Drop by again next Friday for another round of AMA’s, on 5 new papers that the Research Geeks have read. We’ll be releasing our list again around Monday next week so stay posted!


Just came from twitter, this seems fun :smile:


Hey. Great stuff here :slight_smile: What does 5-fold cross-validation setup in Happy DB mean?


Hi Rachel! :slight_smile:

In a k-fold cross validation set up, the training data is randomly divided into k subsets. Training is performed in k trials. For each trial, a subset will be chosen to be the validation dataset, and the remaining will be the training dataset. Trials are repeated until all the subsets have been a validation dataset for once.


Hi @RouEnLee and @jameslee!

Nice to you see guys here again. :slight_smile:

I have a question this time on the 2nd paper, Let’s Dance: Learning From Online Dance Videos.

In Section 5.2 it says that the temporal 3D CNN (Skeletal) approach benefits from encoding the number of people in the frame in addition to the motion over the 16 frames that are convolved in the temporal domain. I don’t understand why encoding the number of people will help this??


Hi Nurul! Welcome back!

Since the Temporal 3D CNN method uses the visualized pose information only. We also know that it is common for people’s bodies to overlap in dance videos. Hence, encoding the number of people into the network allows it to constantly keep track of the number of posable bodies in a single frame. Which in turn allows it to pick up the differences when multiple bodies are overlapped in a frame and when they are not.

As an example, imagine we have a frame from a dance video with 5 people, but 2 of the 5 people have their bodies overlapped in a complex dance move. Now the Temporal 3D CNN knows that there are 5 dancing bodies in a particular frame, but the pose visualization only show 4 silhouettes. It can then start to look for poses that have a lower probability of being a normal dance poses and figure out if it actually is 2 bodies instead of 1.