当前位置:网站首页>7 kinds of visual MLP finishing (Part 2)
7 kinds of visual MLP finishing (Part 2)
2022-07-19 05:47:00 【byzy】
One 、RepMLP
Link to the original text :https://arxiv.org/pdf/2105.01883.pdf
RepMLP(re-parameterized MLP) Taking into account FC Compared with convolution, layer is not good at capturing local information . Its training and inference are different .
Training phase By global perceptron ,partition Perceptron and local perceptron .
Global perceptron

take feature map Is divided into partition. To capture partition Interaction between , Use average pooling to process each partition, Input to BN and 2 Layer of MLP, then reshape, Add to partition map On .
partition perceptron
By a FC Layer and the BN layers , With partition map As input .FC The layer is similar group Convolution group FC To reduce parameters .
group FC When programming group
Convolution realization , Steps are as follows :(1) take
reshape Space dimension is
Of feature map;(2) Use
In groups
Convolution processing ;(3) Will deal with the feature map reshape by
. namely :
RS Express reshape,gMMUL Express group FC.
by group FC The weight matrix of ( Size should be
),
Express
The conversion group Convolution kernel (
A nuclear , The size of each core is
),
and
Respectively FC Dimension of layer input and output . And there should be
,
,
.
Local perceptron
take partition map Through multiple parallel convolution layers ( Keep the resolution consistent with the input , Followed by BN), Number of convolution groups Should and Partition Same in perceptron
Finally, all convolution outputs sum Partition The outputs of the perceptron are added , Restore shape , Get the final output .
Inference stage Will be able to RepMLP Turn into 3 individual FC layer .
The key is two steps :
1. take BN merge To the previous convolution :
2. Convert convolution into FC layer ( by
The unit matrix of dimension ):
among by padding,
Is the convolution kernel ,
by FC The weight of the layer .
In this way, we can put FC3 Convolution merging with local perceptron .
Two 、ResMLP
Link to the original text :https://arxiv.org/pdf/2105.03404.pdf
Res Express residual.
Model structure
First, divide the original image into individual patch, Then we get
Whitman's sign , Input to ResMLP in . In the picture A Affine transformation by column ;T Transpose .
Residual Multi-Perceptron layer
Linear layer + Feedforward layer . Not used LN, Affine transformation is applied to each column :
This transformation is performed twice in each residual block ( The two times are called pre and post). They will be integrated into the linear layer when inferring .
Feedforward network and Transformer equally , Double layer MLP, The activation function becomes GELU.
among Is the linear layer weight ,
by
dimension ,
by
dimension ,
by
dimension .
3、 ... and 、S²-MLPv2
Link to the original text :https://arxiv.org/pdf/2108.01072.pdf
S²-MLP
patch embedding layer + Several S²-MLP block + Sort head
patch embedding Layers divide the image into Of patch, And then through FC obtain
Dimension vector .
S²-MLP block
4 There are... That act on the channel dimension MLP+ Spatial shift layer .
Spatial shift : take It is divided into 4 Share , Then translate along the positive and negative directions of length and width respectively 1 A unit of .
split attention
set up Size is
Of feature map
; among
by patch Count ,
Is the number of channels . Sum along the spatial dimension to get
Dimension vector
:
among For all 1 The row vector .
then adopt MLP Change the dimension into
(
,
by GELU), Again reshape by
matrix
, Along the first dimension softmax Get back
, Generate a new feature map
.
among To multiply by elements .
S²-MLPv2:patch embedding layer + Several S²-MLPv2 block + Sort head
S²-MLPv2 block contain S²-MLPv2 component and channel-mixing MLP (CM-MLP).CM-MLP Structure and front MLP-Mixer( see 7 Kind of vision MLP Arrangement ( On ) Two of them ) The structure in is the same .
S²-MLPv2 Block structure
take
The dimension of becomes
. Then it's divided into 3 The dimensions are
Of feature map, Two of them are spatially shifted according to the figure below , The other remains unchanged .
And then 3 individual feature mapreshape by
Matrix , adopt split attention Then pass
.
边栏推荐
- BeatBox
- DEEP JOINT TRANSMISSION-RECOGNITION FOR POWER-CONSTRAINED IOT DEVICES
- 2021-05-21
- CV-Model【1】:Mnist
- C language implementation of iteration and binary search
- Wxml template syntax in wechat applet
- SnackBar源码解析及封装
- dlib库和.dat文件地址
- Pointnet++ code explanation (III): query_ ball_ Point function
- USB转TTL CH340模块安装(WIN10)
猜你喜欢
软件过程与管理复习(七)
自监督学习概述
Use of MySQL
Could not locate zlibwapi.dll. Please make sure it is in your library path
VS 中 error C4996: ‘scanf‘: This function or variable may be unsafe. 的解决方法。
Pointnet++ code explanation (VII): pointnetsetabstractionmsg layer
Pointnet++代码详解(二):square_distance函数
Pointnet++ code explanation (V): Sample_ and_ Group function and samle_ and_ group_ All function
7. Data warehouse environment preparation for data warehouse construction
seq2seq (中英对照翻译)Attention
随机推荐
Page navigation of wechat applet
Composants communs des applets Wechat
模型时间复杂度和空间复杂度
C语言——冒泡排序
ES6 adds -let and const (disadvantages of VaR & let and const)
Parsing bad JSON data using gson
Kotlin scope function
CV-Model【3】:VGG16
Regular replace group (n) content
Geo_ CNN (tensorflow version)
SnackBar源码解析及封装
JNA loading DLL and its application in jar
软件过程与管理复习(十)
gradle
6. Data warehouse design for data warehouse construction
INRIAPerson数据集转化为yolo训练格式并可视化
Configure tabbar and request network data requests
软件过程与管理复习(七)
微信小程序之计算器
11. DWS layer construction of data warehouse construction