Dive into the neuron

Overview

From the previous section (What is Neural Network), we learned that a neural network is a function, which is composed of neurons, and neuron is also a function.


Neuron can continue to be split into 2 sub-functions:

  • nn element linear function: g(x1,...,xn)g(x_1, ..., x_n)
  • unary non-linear function: h(x)h(x)

The function represented by the neuron is:

f(x1,...,xn)=h(g(x1,...,xn))f(x_1, ..., x_n) = h(g(x_1, ..., x_n))

Linear function g(x1,...,xn)g(x_1, ..., x_n)

The linear function has the following form:

g(x1,...,xn)=w1x1+...,wnxn+bg(x_1, ..., x_n) = w_1x_1 + ..., w_nx_n + b

Among them, w1,...,wn,bw_1, ..., w_n, b are all parameters, and different linear functions have different parameters.

Unary linear function

When n=1n = 1, g(x1)=w1x1+bg(x_1) = w_1x_1 + b, the function image is a straight line:

w1w_11
bb0

Binary linear function

When n=2n = 2, g(x1,x2)=w1x1+w2x2+bg(x_1, x_2) = w_1x_1 + w_2x_2 + b, the function image is a plane:

w1w_10
w2w_21
bb0

nn element linear function

When n>2n > 2, the function image is a hyperplane. Beyond 3D, visualization is not convenient. But you can imagine that its characteristic is straight.

Non-linear function h(x)h(x)

It is easy to understand from the name that a non-linear function is a function different from a linear function. A linear function is straight, and a non-linear function is curved. Such as the most common Sigmoid function:

Activation function

In neural networks, we call this unary non-linear function activation function. For some common activation functions, please refer to activation function in the knowledge base, where:

  • Linear: $ f(x) = x $ is a linear function, which means that a non-linear function is not used
  • Softmax is a special case. Strictly speaking, it is not an activation function

Necessity

Why should a nonlinear activation function be followed by a linear function?


This is because:

  1. If neurons are all linear functions, then the neural network composed of neurons is also a linear function.

Such as the following example:

  • f1(x,y)=w1x+w2y+b1f_1(x, y) = w_1x + w_2y + b_1
  • f2(x,y)=w3x+w4y+b2f_2(x, y) = w_3x + w_4y + b_2
  • f3(x,y)=w5x+w6y+b3f_3(x, y) = w_5x + w_6y + b_3

Then the function represented by the entire neural network is:

f3(f1(x1,x2,x3),f2(x1,x2,x3))=w5(w1x1+w2x2+b1)+w6(w3x2+w4x3+b2)+b3=(w1w5)x1+(w2w5+w3w6)x2+(w4w6)x3+(w5b1+w6b2+b3)\begin{aligned} &f_3(f_1(x_1, x_2, x_3), f_2(x_1, x_2, x_3)) \\ = &w_5(w_1x_1 + w_2x_2 + b_1) + w_6(w_3x_2 + w_4x_3 + b_2) + b_3 \\ = &(w_1w_5)x_1 + (w_2w_5 + w_3w_6)x_2 + (w_4w_6)x_3 + (w_5b_1 + w_6b_2 + b_3) \\ \end{aligned}

This is a ternary linear function.

  1. The objective function we need to construct contains various functions, and the linear function is just one of them.

We hope that neural networks can simulate arbitrary functions, not just linear functions. So we added a non-linear activation function and "bended" the linear function.

Complete neuron

The complete neuron combines a linear function and a non-linear activation function, making it more interesting and powerful.

Unary function

When $ n = 1 $, $ g(x_1) = w_1x_1 + b $, using Sigmoid activation function, the neuron's corresponding function is:

h(g(x))=sigmoid(wx+b)h(g(x))=\text{sigmoid}(wx + b)

The function image is:

ww1
bb0

Binary function

When n=2n = 2, g(x1,x2)=w1x1+w2x2+bg(x_1, x_2) = w_1x_1 + w_2x_2 + b, using Sigmoidactivation function, the neuron's corresponding function is:

h(g(x))=sigmoid(w1x1+w2x2+b)h(g(x))=\text{sigmoid}(w_1x_1 + w_2x_2 + b)

The function image is:

w1w_10
w2w_21
bb0

nn-element function

Due to the visualization problem, it is entirely up to my own imagination here! 😥

Question

Why the neural network can simulate complex functions from combination of neurons ?

You can intuitively imagine how to simulate a slightly more complicated function through simple neurons.