- What is Neural Network?
- Dive into the neuron
- How does a neural network simulate an arbitrary function
- Why do we need neural networks

- How to construct a neural network
- Fully connected neural network
- Use graphical tool to design neural network
- The "activation function" of the output layer

- How to train a neural network
- Learning algorithm and principle
- Build and train neural networks from scratch
- Rewrite the code using PyTorch
- Use graphical tool to train neural network

- Some important problems of neural network
- Network structure
- Overfitting
- Underfitting
- Overfitting vs underfitting
- Initialization
- Vanishing gradient and exploding gradient

- Convolutional Neural Network(CNN)
- 1D-convolution
- 1D-convolution experiments
- 1D-pooling
- 1D-CNN experiments
- 2D-CNN
- 2D-CNN experiments

- Recurrent Neural Network(RNN)
- Vanilla RNN
- Seq2seq, Autoencoder, Encoder-Decoder
- Advanced RNN
- RNN classification experiment

- Natural language processing
- Embedding: Convert symbols to values
- Text Classification 1
- Text Classification 2
- TextCNN
- Entity recognition
- Word segmentation, POS tagging and chunking
- Sequence tagging in action
- Bidirectional RNN
- BI-LSTM-CRF
- Attention

- Language Models
- n-gram Model: Unigram
- n-gram Model: Bigram
- n-gram Model: Trigram
- RNN Language Model
- Transformer Language Model

- Linear Algebra
- Vector
- Matrix
- Dive in matrix multiplication
- Tensor

From the previous section (What is Neural Network), we learned that a neural network is a function, which is composed of neurons, and neuron is also a function.

Neuron can continue to be split into 2 sub-functions:

- $n$ element linear function: $g(x_1, ..., x_n)$
- unary non-linear function: $h(x)$

The function represented by the neuron is:

$f(x_1, ..., x_n) = h(g(x_1, ..., x_n))$The linear function has the following form:

$g(x_1, ..., x_n) = w_1x_1 + ..., w_nx_n + b$Among them, $w_1, ..., w_n, b$ are all parameters, and different linear functions have different parameters.

When $n = 1$, $g(x_1) = w_1x_1 + b$, the function image is a straight line:

When $n = 2$, $g(x_1, x_2) = w_1x_1 + w_2x_2 + b$, the function image is a plane:

When $n > 2$, the function image is a hyperplane. Beyond 3D, visualization is not convenient. But you can imagine that its characteristic is straight.

It is easy to understand from the name that a non-linear function is a function different from a linear function. A linear function is straight, and a non-linear function is curved. Such as the most common Sigmoid function:

In neural networks, we call this unary non-linear function **activation function**. For some common activation functions, please refer to activation function in the knowledge base, where:

**Linear**:`$ f(x) = x $`

is a linear function, which means that a non-linear function is not used**Softmax**is a special case. Strictly speaking, it is not an activation function

Why should a nonlinear activation function be followed by a linear function?

This is because:

- If neurons are all linear functions, then the neural network composed of neurons is also a linear function.

Such as the following example:

- $f_1(x, y) = w_1x + w_2y + b_1$
- $f_2(x, y) = w_3x + w_4y + b_2$
- $f_3(x, y) = w_5x + w_6y + b_3$

Then the function represented by the entire neural network is:

$\begin{aligned} &f_3(f_1(x_1, x_2, x_3), f_2(x_1, x_2, x_3)) \\ = &w_5(w_1x_1 + w_2x_2 + b_1) + w_6(w_3x_2 + w_4x_3 + b_2) + b_3 \\ = &(w_1w_5)x_1 + (w_2w_5 + w_3w_6)x_2 + (w_4w_6)x_3 + (w_5b_1 + w_6b_2 + b_3) \\ \end{aligned}$This is a ternary linear function.

- The objective function we need to construct contains various functions, and the linear function is just one of them.

We hope that neural networks can simulate arbitrary functions, not just linear functions. So we added a non-linear activation function and "bended" the linear function.

The complete neuron combines a linear function and a non-linear activation function, making it more interesting and powerful.

When `$ n = 1 $`

, `$ g(x_1) = w_1x_1 + b $`

, using Sigmoid activation function, the neuron's corresponding function is:

The function image is:

When $n = 2$, $g(x_1, x_2) = w_1x_1 + w_2x_2 + b$, using Sigmoidactivation function, the neuron's corresponding function is:

$h(g(x))=\text{sigmoid}(w_1x_1 + w_2x_2 + b)$The function image is:

Due to the visualization problem, it is entirely up to my own imagination here! 😥

You can intuitively imagine how to simulate a slightly more complicated function through simple neurons.