当前位置:网站首页>Deep learning 7 deep feedforward network 2
Deep learning 7 deep feedforward network 2
2022-07-19 08:12:00 【Water W】
This article follows the previous one Deep learning 7 Deep feedforward network _ water w The blog of -CSDN Blog
Catalog
Automatic gradient calculation
5、 Static calculation diagram 、 Dynamic calculation diagram
The main problem of neural network parameter optimization
Automatic gradient calculation
It is very tedious and error prone to manually use the chain rule to calculate the derivative of each parameter and program it . Calculations can be used The machine realizes the automatic gradient calculation of parameters , The methods can be divided into Numerical differentiation 、 Sign differential and Automatic differentiation Three types of .
1、 Numerical differentiation

in application , The following method is often used to calculate the gradient , To reduce the truncation error ,
* ∆𝑥 It's hard to be sure , Too small will cause rounding error , Too large will increase the truncation error ;* Although the implementation is very simple , But the practicability is poor ;* High computational complexity , Because each parameter needs to be perturbed separately , If the number of parameters is 𝑁 , The complexity is 𝑂(𝑁2)
2、 Sign differential
* A symbol based calculation ( Algebraic computation ) Automatic derivation method of , Use a computer to Solving mathematical expressions with variables* Variables are treated as symbols , There is no need to substitute specific values , Both input and output are mathematical expressions ;* Including rule-based simplification 、 Factorization 、 differential 、 integral 、 Solve algebraic equations 、 Solve ordinary differential equations and other operations ;* Long compilation time ;* Special mathematical computing language is needed ;* It is difficult to debug ;* For deep composite functions , The output expression is very verbose , Form expression expansion (expression swell);
3、 Automatic differentiation
* A method between numerical differentiation and symbolic differentiation• Numerical differentiation emphasizes directly substituting into numerical approximate solution at the beginning , Symbolic differentiation emphasizes solving the expression directly , Finally, the value is substituted ;• Automatic differentiation applies symbolic differentiation to the most basic operators , Like a constant 、 power function 、 Exponential function 、 Logarithmic function 、 Trigonometric functions, etc , Substitute it into the value , Keep intermediate results , Finally, it is applied to the whole function ;* High flexibility• The differential solution process is transparent to users ;• There is no need for special mathematical language and programming ;• Use the method of graph to calculate , A lot of optimizations can be done ;
4、 Calculation chart
* Decompose a composite function into a series of basic operations , And connect them in the form of a graph ;
* It is a graph structure representation of mathematical operations , Each non leaf node represents a basic operation , Each leaf node generation Table an input variable or constant ;
example :
Regularized single hidden layer MLP Calculation chart :
5、 Static calculation diagram 、 Dynamic calculation diagram
In the current deep learning framework ,Theano and Tensorflow The static calculation diagram is used , and DyNet、Chainer and PyTorch The dynamic graph is adopted .Tensorflow 2.0 Also began to support dynamic computing graph .Static calculation diagram (Static Computational Graph)* Build the calculation diagram at compile time , After being built, it cannot be changed when the program is running* It can be optimized during construction 、 Parallel ability* Poor flexibilityDynamic calculation diagram (Dynamic Computational Graph)* Dynamically build the calculation diagram when the program is running* It's not easy to optimize , When the network structure used by different inputs is different , It is difficult to calculate in parallel* More flexibility
The main problem of neural network parameter optimization
1、 Nonconvex Optimization

among 𝑤1 and 𝑤2 For network parameters ,𝜎(∙) by Logistic Activation function .
2、 The gradient disappears
The error decays and even disappears in the process of back propagation

When the error propagates in each layer, it is multiplied by the derivative of the activation function of that layer .

The range of derivatives of both is less than or equal to 1, The derivative in the saturation region is closer to 0, This will cause the gradient to decay , Even disappear , It makes the whole network difficult to train .
边栏推荐
- 代码学习(DeamNet)CVPR | Adaptive Consistency Prior based Deep Network for Image Denoising
- Paddleserving服务化部署 tensorrt报错, shape of trt subgraph is [-1,-1,768],
- 1669. Merge two linked lists (merge of two linked lists)
- [characteristic Engineering]
- Kingbasees can realize any of MySQL by constructing an aggregate function_ Value function.
- 半导体材料技术
- How does the V8 engine recycle garbage memory?
- High performance integrated video image processing board based on ultrascale FPGA + Huawei Hisilicon arm / fpga+arm
- unity 自定义天空球模型防止被裁剪
- 从 B 站崩溃报告看分布式系统的技术栈
猜你喜欢
Jira --- workflow call external api
Detailed explanation of type, user-defined type, preliminary understanding of structure
RNN convolutional neural network
才意识到自己“奇葩”的360,会不会有些晚?
What if the user information in the website app database is leaked and tampered with
Junit5
Random forest of machine learning
History and value of forked coins | eth, BCH, BSV 2020-03-08
Go language Bible
Titanic passenger rescue prediction (Advanced)
随机推荐
DP dynamic planning enterprise level template analysis (Digital triangle, rising sequence, knapsack, state machine, compressed DP)
【flask入门系列】请求钩子与上下文
Yolov5 label and establish your own data set
Xinlinx zynq7020, 7045 domestic replacement fmql45t900 national production arm core board + expansion board
redis分布式锁
Semiconductor material technology
[day01] preface, introductory program, constant variables
[MySQL] lock mechanism: detailed explanation of lock classification, table lock, row lock and page lock in InnoDB engine
分叉币的发展史及价值|ETH、BCH、BSV 2020-03-08
C语言一个数如果恰好等于它的因子之和,这个数就称为“完数”。例如6=1+2+3.编程
Go语言圣经
“韭菜”是怎么把钱送给“镰刀”的? 2020-03-07
【JVM】之虚拟机栈
ObjectARX--自定义圆的实现
Error received from peer ipv4/Connection reset by peer Paddleserving服务化部署后报错
[C# Console]-C# 控制台类
通过ip获取归属地
Beijing Jiewen technology, an acquiring outsourcing service provider, transferred 60% of its shares for about 480million
Go language Bible
[C # variable constant keyword] - variable constants and keywords in C #