当前位置：网站首页>Deep learning 7 deep feedforward network 2

Deep learning 7 deep feedforward network 2

2022-07-19 08:12:00 【Water W】

This article follows the previous one Deep learning 7 Deep feedforward network _ water w The blog of -CSDN Blog

Catalog

Automatic gradient calculation

1、 Numerical differentiation

2、 Sign differential

3、 Automatic differentiation

4、 Calculation chart

5、 Static calculation diagram 、 Dynamic calculation diagram

The main problem of neural network parameter optimization

1、 Nonconvex Optimization

2、 The gradient disappears

Automatic gradient calculation

It is very tedious and error prone to manually use the chain rule to calculate the derivative of each parameter and program it . Calculations can be used The machine realizes the automatic gradient calculation of parameters , The methods can be divided into Numerical differentiation 、 Sign differential and Automatic differentiation Three types of .

1、 Numerical differentiation

Use numerical methods to calculate functions 𝑓 𝑥 The derivative of , function 𝑓(𝑥) At point 𝑥 The derivative at is defined as ：

in application , The following method is often used to calculate the gradient , To reduce the truncation error ,

* ∆𝑥 It's hard to be sure , Too small will cause rounding error , Too large will increase the truncation error ;
* Although the implementation is very simple , But the practicability is poor ;
* High computational complexity , Because each parameter needs to be perturbed separately , If the number of parameters is 𝑁 , The complexity is 𝑂(𝑁2)

2、 Sign differential

* A symbol based calculation ( Algebraic computation ) Automatic derivation method of , Use a computer to Solving mathematical expressions with variables
* Variables are treated as symbols , There is no need to substitute specific values , Both input and output are mathematical expressions ;
* Including rule-based simplification 、 Factorization 、 differential 、 integral 、 Solve algebraic equations 、 Solve ordinary differential equations and other operations ;
* Long compilation time ;
* Special mathematical computing language is needed ;
* It is difficult to debug ;
* For deep composite functions , The output expression is very verbose , Form expression expansion （expression swell）;

3、 Automatic differentiation

* A method between numerical differentiation and symbolic differentiation
        • Numerical differentiation emphasizes directly substituting into numerical approximate solution at the beginning , Symbolic differentiation emphasizes solving the expression directly , Finally, the value is substituted ;
        • Automatic differentiation applies symbolic differentiation to the most basic operators , Like a constant 、 power function 、 Exponential function 、 Logarithmic function 、 Trigonometric functions, etc , Substitute it into the value , Keep intermediate results , Finally, it is applied to the whole function ;
* High flexibility
        • The differential solution process is transparent to users ;
        • There is no need for special mathematical language and programming ;
        • Use the method of graph to calculate , A lot of optimizations can be done ;

4、 Calculation chart

* Decompose a composite function into a series of basic operations , And connect them in the form of a graph ;
* It is a graph structure representation of mathematical operations , Each non leaf node represents a basic operation , Each leaf node generation Table an input variable or constant ;

example ：

Regularized single hidden layer MLP Calculation chart ：

5、 Static calculation diagram 、 Dynamic calculation diagram

In the current deep learning framework ,Theano and Tensorflow The static calculation diagram is used , and DyNet、Chainer and PyTorch The dynamic graph is adopted .Tensorflow 2.0 Also began to support dynamic computing graph .
Static calculation diagram （Static Computational Graph）
        * Build the calculation diagram at compile time , After being built, it cannot be changed when the program is running
        * It can be optimized during construction 、 Parallel ability
        * Poor flexibility
Dynamic calculation diagram （Dynamic Computational Graph）
        * Dynamically build the calculation diagram when the program is running
        * It's not easy to optimize , When the network structure used by different inputs is different , It is difficult to calculate in parallel
        * More flexibility

The main problem of neural network parameter optimization

1、 Nonconvex Optimization

The optimization problem of neural network is a non convex optimization problem

With one of the simplest 1-1-1 Take the two-layer network of structure as an example ：

among 𝑤1 and 𝑤2 For network parameters ,𝜎(∙) by Logistic Activation function .

Given a sample 1,1 , Squared error loss and Cross entropy loss With the parameters 𝑤1 and 𝑤2 The relationship is as follows ：

2、 The gradient disappears

The error decays and even disappears in the process of back propagation

* Review the iterative formula of error back propagation ：

When the error propagates in each layer, it is multiplied by the derivative of the activation function of that layer .

* When Sigmoid Type activation function , Derivative is ：

The range of derivatives of both is less than or equal to 1, The derivative in the saturation region is closer to 0, This will cause the gradient to decay , Even disappear , It makes the whole network difficult to train .