Discussion pointers:

- Why take the exponential?
- Why is it important to normalise?
- What does minimising the log probability do?
- Why do we take the negative in the cross-entropy error?
- How does the regularization term change the loss function?

Additional Learning Materials:

- Vector notation cheat sheet, and another specific to derivatives and integrals
- Important vector calculus identities
- Matrix notations and operations - cs231n notes on backprop and network architectures
- Review of differential calculus
- Natural Language Processing (almost) from Scratch
- Learning Representations by Backpropagating Errors

Assignment:

- Replicate this Neural Network in Numpy Tutorial without looking at the notebook; complete exercises at the end of notebook.
- Reply this thread with an overview of activation functions commonly used in Neural Networks