Neural Network

English

What is Neural Network?
1. Dive into the neuron
2. How does a neural network simulate an arbitrary function
3. Why do we need neural networks
How to construct a neural network
1. Fully connected neural network
2. Use graphical tool to design neural network
3. The "activation function" of the output layer
How to train a neural network
1. Learning algorithm and principle
2. Build and train neural networks from scratch
3. Rewrite the code using PyTorch
4. Use graphical tool to train neural network
Some important problems of neural network
1. Network structure
2. Overfitting
3. Underfitting
4. Overfitting vs underfitting
5. Initialization
6. Vanishing gradient and exploding gradient
Convolutional Neural Network(CNN)
1. 1D-convolution
2. 1D-convolution experiments
3. 1D-pooling
4. 1D-CNN experiments
5. 2D-CNN
6. 2D-CNN experiments
Recurrent Neural Network(RNN)
1. Vanilla RNN
2. Seq2seq, Autoencoder, Encoder-Decoder
3. Advanced RNN
4. RNN classification experiment
Natural language processing
1. Embedding: Convert symbols to values
2. Text Classification 1
3. Text Classification 2
4. TextCNN
5. Entity recognition
6. Word segmentation, POS tagging and chunking
7. Sequence tagging in action
8. Bidirectional RNN
9. BI-LSTM-CRF
10. Attention
Language Models
1. n-gram Model: Unigram
2. n-gram Model: Bigram
3. n-gram Model: Trigram
4. RNN Language Model
5. Transformer Language Model
Linear Algebra
1. Vector
2. Matrix
3. Dive in matrix multiplication
4. Tensor

Dive into the neuron

Overview

From the previous section (What is Neural Network), we learned that a neural network is a function, which is composed of neurons, and neuron is also a function.

Neuron can continue to be split into 2 sub-functions:

$n$ element linear function: $g(x_1, ..., x_n)$
unary non-linear function: $h(x)$

The function represented by the neuron is:

f(x_1, ..., x_n) = h(g(x_1, ..., x_n))

Linear function $g(x_1, ..., x_n)$

The linear function has the following form:

g(x_1, ..., x_n) = w_1x_1 + ..., w_nx_n + b

Among them, $w_1, ..., w_n, b$ are all parameters, and different linear functions have different parameters.

Unary linear function

When $n = 1$ , $g(x_1) = w_1x_1 + b$ , the function image is a straight line:


 $w_1$ 1
 $b$ 0

Binary linear function

When $n = 2$ , $g(x_1, x_2) = w_1x_1 + w_2x_2 + b$ , the function image is a plane:


 $w_1$ 0
 $w_2$ 1
 $b$ 0

$n$ element linear function

When $n > 2$ , the function image is a hyperplane. Beyond 3D, visualization is not convenient. But you can imagine that its characteristic is straight.

Non-linear function $h(x)$

It is easy to understand from the name that a non-linear function is a function different from a linear function. A linear function is straight, and a non-linear function is curved. Such as the most common Sigmoid function:

Activation function

In neural networks, we call this unary non-linear function activation function. For some common activation functions, please refer to activation function in the knowledge base, where:

Linear: $ f(x) = x $ is a linear function, which means that a non-linear function is not used
Softmax is a special case. Strictly speaking, it is not an activation function

Necessity

Why should a nonlinear activation function be followed by a linear function?

This is because:

If neurons are all linear functions, then the neural network composed of neurons is also a linear function.

Such as the following example:

$f_1(x, y) = w_1x + w_2y + b_1$
$f_2(x, y) = w_3x + w_4y + b_2$
$f_3(x, y) = w_5x + w_6y + b_3$

Then the function represented by the entire neural network is:

\begin{aligned} &f_3(f_1(x_1, x_2, x_3), f_2(x_1, x_2, x_3)) \\ = &w_5(w_1x_1 + w_2x_2 + b_1) + w_6(w_3x_2 + w_4x_3 + b_2) + b_3 \\ = &(w_1w_5)x_1 + (w_2w_5 + w_3w_6)x_2 + (w_4w_6)x_3 + (w_5b_1 + w_6b_2 + b_3) \\ \end{aligned}

This is a ternary linear function.

The objective function we need to construct contains various functions, and the linear function is just one of them.

We hope that neural networks can simulate arbitrary functions, not just linear functions. So we added a non-linear activation function and "bended" the linear function.

Complete neuron

The complete neuron combines a linear function and a non-linear activation function, making it more interesting and powerful.

Unary function

When $ n = 1 $ , $ g(x_1) = w_1x_1 + b $ , using Sigmoid activation function, the neuron's corresponding function is:

h(g(x))=\text{sigmoid}(wx + b)

The function image is:


 $w$ 1
 $b$ 0

Binary function

When $n = 2$ , $g(x_1, x_2) = w_1x_1 + w_2x_2 + b$ , using Sigmoidactivation function, the neuron's corresponding function is:

h(g(x))=\text{sigmoid}(w_1x_1 + w_2x_2 + b)

The function image is:


 $w_1$ 0
 $w_2$ 1
 $b$ 0

$n$ -element function

Due to the visualization problem, it is entirely up to my own imagination here! 😥

Question

Why the neural network can simulate complex functions from combination of neurons ?

You can intuitively imagine how to simulate a slightly more complicated function through simple neurons.