当前位置：网站首页>In depth understanding of machine learning - unbalanced learning: sample sampling technology - [adasyn sampling method of manual sampling technology]

In depth understanding of machine learning - unbalanced learning: sample sampling technology - [adasyn sampling method of manual sampling technology]

2022-07-19 03:12:00 【von Neumann】

Catalogues ：《 In depth understanding of machine learning 》 General catalogue

And Borderline-SMOTE The algorithm is similar ,ADASYN（Adaptive Synthetic Sampling） The algorithm is also an improved SMOTE Algorithm . The algorithm is based on 2008 in , The main idea is to make full use of the density distribution information of samples to determine the frequency of each minority sample as the main sample , Synthesize more training data for a few difficult category samples , So as to correct the negative effects caused by the unbalanced distribution of categories as much as possible .

ADASYN The algorithm first needs to determine the number of new minority samples to be generated , namely $N^+\times \text{SR}$ , Then in the original training set $S$ Find each minority sample on $x_i^+, i=1, 2, \cdots, N^+$ Of $K$ a near neighbor , among , The first $i$ The number of nearest neighbors of the majority classes of the minority class samples is recorded as $N_i^\text{major}$ , Then the proportion parameters of each minority sample can be determined by the following formula $\Gamma_i$ ：
$\Gamma_i=\frac{N_i^\text{major}}{Z\times K}$

among , $Z$ Is the standardization factor , To guarantee $\sum\Gamma_i=1$ . After the proportion parameter is determined , The frequency that each minority sample is selected as the main sample can be determined by the following formula ：
$g_i=\Gamma_i\times N^+\times\text{SR}$

It is not difficult to see from the above formula , And Borderline-SMOTE The algorithm is similar ,ADASYN The algorithm pays more attention to a few samples located near the decision boundary , They are selected as the main samples much more frequently than those located in a few decision-making areas . Of course , This will further amplify the propagation intensity of a few types of noise information ,ADASYN The specific flow of the algorithm is as follows ：

ADASYN Sampling method
Input ： Training set $S=\{(x_i, y_i), i=1, 2, \cdots, N, y_i\in\{+, -\}\}$ ; Number of samples of most classes $N^-$ , Number of samples of a few classes $N^+$ , among $N^-+N^+=N$ ; Unbalance ratio $\text{IR}=\frac{N^-}{N^+}$ ; Sampling rate $\text{SR}$ ; Proximity parameter $K$
Output ： Training set after oversampling $S=\{(x_i, y_i), i=1, 2, \cdots, N+N^+\times \text{SR}, y_i\in\{+, -\}\}$
( 1 ) From the training set $S$ Take out all the samples of majority and minority classes , Make up the training sample set of most classes $S^-$ And a few training sample sets $S^+$
( 2 ) Set the newly generated sample set $S^\text{New}$ It's empty +
( 3 ) for $i=1:N^+$
( 4 ) $\quad$ stay $S^+$ Find the corresponding sample in $x_i$
( 5 ) $\quad$ stay $S$ Find $x_i$ Of $K$ a near neighbor , Record that the number of nearest neighbors of most classes is $N^{\text{major}}$
( 6 ) $\quad$ Calculate its scale parameters ： $\Gamma_i=\frac{N_i^\text{major}}{Z\times K}$
( 7 ) $\quad$ Calculate the main sample frequency ： $g_i=\Gamma_i\times N^+\times\text{SR}$
( 8 ) for $i=1:N^+$
( 9 ) $\quad$ stay $S^+$ Select a main sample randomly $x_i$
(10) $\quad$ for $i=1:g_i$
(11) $\qquad$ call SMOTE Algorithm generates master samples $x_i$ A new sample of $x_i^\text{new}$
(12) $\qquad$ add to $x_i^\text{new}$ to $S^\text{New}$ ： $S^\text{New}=S^\text{New}\cup x_i^\text{new}$
(13) return Training set after oversampling $S'=S^-\cup S^\text{New}$

原网站

版权声明
本文为[von Neumann]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/200/202207170033450589.html

当前位置：网站首页>In depth understanding of machine learning - unbalanced learning: sample sampling technology - [adasyn sampling method of manual sampling technology]

In depth understanding of machine learning - unbalanced learning: sample sampling technology - [adasyn sampling method of manual sampling technology]

边栏推荐

猜你喜欢

随机推荐