当前位置:网站首页>Efficientnet series (1): efficientnetv2 network details
Efficientnet series (1): efficientnetv2 network details
2022-07-19 03:50:00 【@BangBang】
EfficicentNet Network profile
EfficientNet:Rethinking Model Scaling for Convolutional Neural Networks, This paper is Google stay 2019 Articles published in .
EfficientNet This paper , The author is also about Input resolution , Network depth , Width
Impact on accuracy , In the previous article is Separately increase the image resolution or increase the network depth or increase the network width
, Try to improve the accuracy of the network . stay EfficientNet In this paper , The author used Network search technology NAS
To explore the input resolution at the same time , Network depth 、 The effect of width .
EfficientNet What is the effect of ?
This picture is given by the author of the original paper about Efficient And a series of mainstream classification networks at that time Top-1
The accuracy of , We found that EfficientNet Not only is the number of parameters smaller than many mainstream models , The accuracy is obviously better
.
- It is mentioned in the paper that , What this article puts forward
EfficientNet-B7
stay ImageNet top-1 The highest accuracy of that year84.3%
, With the highest accuracy before GPipe comparison , The number of parameters is only 1/8.4, Reasoning speed has increased6.1
times
Network comparison ( Width 、 depth 、 The resolution of the )
- chart a Traditional convolutional neural networks
- chart b, In the figure a On the basis of, the network is added separately
Width
( Width representsFeature layer channel
) - chart c, In the figure a On the basis of, the network is added separately
depth
, It is obvious that relative to the figure a, itslayers
Obviously more , The network will become deeper - chart d, In the figure a Based on the benchmark network
The resolution of the
Added , Improve the resolution of the image, and each feature matrix we getHeight and width
It will increase accordingly - chart e, Increase the width of the network at the same time 、 Depth and resolution of the input image
- Based on past experience , Increase the depth of the network depth Can get richer 、 Complex features and can be well applied to other tasks .
But if the depth of the network is too deep, the gradient will disappear , The problem of difficult training
. - Increase the of the network
width
It can obtain finer grained features and is easier to train , But for thewidth
Large and shallow networks are often difficult to learn deeper features . - Increase the input network
Image resolution
Be able to potentially obtainHigher granularity
Feature template , But for very high input resolution , The gain of accuracy will also be reduced . And large resolution images will increase the amount of computation .
As can be seen from the above figure ,scale by width
,scale by depth
,scale by resolution
, It is found that the accuracy of these three dotted lines basically reaches 80% After that, it is basically saturated and no longer increases
. For the red line , We also increase the network Width 、 depth 、 The resolution of the
, We found that it reached 80% There is no saturation after the accuracy of , And it can continue to grow
. This shows that we increase the network at the same time depth 、 Width 、 The resolution of the
Words , We can get a better result .
And when the theoretical calculations are the same , We also increase the depth of the network 、 Width 、 Resolution , The effect of network will be better .
EfficientNet-B0 Network
EfficientNet-B0 The Internet , The author also passed Network search technology
Got , Its detailed network parameters are shown in the following table
- We found out
Efficient
instage
Altogether1~9
individual stage.stage 1
It's a3x3
The convolution of layer . aboutstage2~stage8
We can find out , It is stacking repeatedlyMBConv
, thereMBConv
NamelyMobienet
Conv, I'll talk about .
Stage 9By 3 Part of the form :
Conv 1x1and
Poolingand
FC` layer . - The resolution here (
Resolution
), Corresponding to eachStage
The height and width of Channels
, For each of usStage
Of the output characteristic matrixchannel
Number ,Layers
: Put our correspondingOperator
How many repetitions , such asstage3
CorrespondingLayers
by2
, Would be right.MBConv6
Repeat twice- there
stride
CorrespondingLayers
Corresponding to the first layerstride
, The other steps are equal to 1 Of .
EfficientNet-B0 Network
MBConv modular
In fact, the paper also said ,MBConv Actually sum MobileNet v3 The use of Block It's the same . Let's take a brief look at Efficient
We use MBConv Its structure .
- First of all, for our main branch , It's a
1x1
The convolution of is generally used to raise the dimension , And then through BN as well asSwish
Activation function - Followed by a
DW
Convolution , Its convolution kernel isk x k
,k May be3
It could be5
, The step distance here may be1
It could be2
. - And then
DW
The output of convolution passesBN
andSwish
After activating the function , Through aSE
modular . - Followed by a
1x1
Convolution of , there1x1
Convolution starts a dimensionality reduction function , Notice that there is only oneBN
, No,swish
Activation function . - Followed by a
dropout
operation - Then input us into the characteristic matrix , From our
shortcut
Branches lead to , Directly with our main branchOutput characteristic matrix
ConductAdd up
Get our correspondingOutput
.
Here are a few points to note :
- The first ascending convolution layer , The number of convolution kernels is the input characteristic matrix
channel
Of n times , there n What's the corresponding , It's usOperator
Corresponding MBCov Corresponding number , Is our magnification factor n - about MBConv The last reduced convolution layer , How many convolution kernels does it have , It corresponds to the table above
Channels
To set it up . hereChannels
How much , Here1x1
The number of convolution kernels is equal to . - The first 2 One thing to pay attention to is when MBConv1 when , Is this time n=1 When , We don't need to
1x1
Convoluted , Because we know the first1x1
Convolution mainly plays the role of increasing dimension , So when n=1 At that time, it is equivalent to no dimension upgrading . The corresponding is in the tableStage2
Correspondingoperator
yesMBConv1
, It's here MBConv It's not1x1
Convoluted - About
shortcut
Connect , Only if you enterMBConv
Structure characteristic matrix and output characteristic matrix shape The same time exists
SE modular
- First of all, for the input characteristic matrix
feature map
Every one ofchannel
Average pool operation
, Then through two full connection layers . - Note that the activation function of the first full connection layer is
Swish
Activation function , The activation function of the second full connection layer makessigmoid
Activation function .
- The number of nodes in the first full connection layer is the input valueMBConv
Characteristic matrixchannels
Of 1/4, The number of nodes in the second full connection layer is equal tofeature_map
Ofchannels
Number , therefeature_map
JustMBConv
inDW
Output characteristic matrix .
EfficientNet-B0~ EfficientNet-B7 Network parameters
EfficientNet-B0~ EfficientNet-B7
The structure of the network is the same , It's the Internetinput_size
,width_coefficient
,depth_coefficient
And other parameter settings are different .width_coefficient
representative channel The multiplier factor on the dimension , For exampleEfficientNetB0
Medium Stage1 Of 3x3 The number of convolution kernels used in the convolution layer is 32, So in B6 The middle is32 x 1.8=57.6
Then round it to the nearest 8 Multiple integer is 56, other stage Empathy .depth_coefficient
representativedepth
The multiplier factor on the dimension ( Targeted onlyStage2
ToStage8
), For exampleEfficientNetB0
in Stage7 OfL=4,
So inB6
The middle is4 x 2.6 =10.4
, Then round up, that is 11drop_connect_rate
It corresponds to MBConv In the middle of dropout Random deactivation ratio of layers , Note that not all MBConv Layer of dropout All equal to 0.2. In the source implementation , Will all MBConv In structure dropout The random deactivation ratio of layers is from 0 Slowly increase to the givendrop_connect_rate
,- the last one
dropout_rate
The corresponding isEfficientNet
Finally, the corresponding before the full connection layer dropout Deactivation ratio
Performance comparison
- EfficientNet-B0 Compared with us ResNet-50 And us DenseNet-169, We can see that its accuracy is the highest , The number of parameters is the least , Its theoretical calculation is the lowest . Again B1~B7 A series of networks are compared
- But in practice ,
First of all, its accuracy is really high , Then the number of parameters is really small
, There is no doubt that . But there is a problem in online trainingVery occupy GPU Explicit memory of
, Because in us EfficientNet As in theB4,B5,B6,B7 These models , The resolution of its input image is very large, which leads to the corresponding increase in the height and width of the output characteristic matrix of each layer structure
. So the occupation of our video memory will also increase . - And for the direct comparison of speed
Flops
It's not quite right ,In reality, the speed we are concerned about is actually the speed of reasoning on the device
; Real reasoning speed and Flops In fact, it is not directly related , It is also influenced by many other factors ,So it will be more meaningful if you can give its reasoning time on some devices
边栏推荐
猜你喜欢
渗透测试-02漏洞扫描
Bias and variance
Frequency school and Bayes school
What happens when MySQL tables change from compressed tables to ordinary tables?
TS的使用案例——贪吃蛇
Application of MATLAB in linear algebra
当 mysql 表从压缩表变成普通表会发生什么?
ClickHouse 中的公共表表达式 CTE
[C language] 0 basic tutorial - file operation (to be continued)
清晰扫描件怎么弄:试试扫描裁缝ScanTailor Advanced吧 | 含scantailor使用方法
随机推荐
为什么越来越多人开始选择过“低配生活”?
【无标题】
No, check it out
2022长三角数学建模:齿轮箱故障诊断
一种鲁棒变形卷积神经网络图像去噪
正畸学分支和工具
Properties of Gaussian distribution (including code)
Application of MATLAB in linear algebra
Matlab在线性代数中的应用
Bag of visual words (bovw) personal understanding of visual word bag
Implementation of address book for dynamic memory management
[nodejs] npm/nrm cannot load the file because the script solution is prohibited in this system
EfficientNet系列(1): EfficientNetV2网络详解
Private storage space of threads
电脑端实现微信双开(登录两个微信)
Mouse slide two pictures before and after comparison JS plug-in
web语义化(强调标签-em-斜体)(重点强调标签-strong-粗体)(自定义列表:dl、dt、dd)
Paper template format of mathematical modeling competition
[2016 CCPC Hangzhou j] just a math problem (Mobius inversion)
Penetration test-02 vulnerability scanning