Bagging and Boosting
Bagging makes use of the concept of ‘collective intelligence’ where different weak learners
(each having moderate accuracy; at least 50%) are combined to make a final prediction based on
majority or taking average of probabilities by each learner (for classification problems).
Since different learners are independent of each other, bagging is considered as a parallel procedure.
The figure below demonstrates the bagging mechanism where various learning sets are
created out of our data L ( and a predictor is created for each of these learning sets (L1, L2, ..., LB). Eventually, the predictions by all the predictors are aggregated to form the final prediction.
How predictions are made using bagging?
If the dependent variable y is numeric in nature, then the aggregation is done by taking the average of the predictions by various predictors. However, if y is categorical, then the aggregation is can be done by either majority voting or weighted voting. If the classifier returns the probability for classification
problems, then one can consider the average probability for all the learners.
Since all the learners are independent of each other thus, bagging is a parallel procedure.
Although with bagging one loses the feasibility to interpret the structure, but it can lead to gains in accuracy.
Like bagging, boosting also aims at combining weak learners to form a powerful committee. However, instead of creating models in parallel, boosting develops the models sequentially.
Initially all points are given the same weightage and a base learner is created. Afterwards, the points misclassified by the base learner are given more weightage and the next base learning algorithm is applied. This is repeated until the stopping criteria is reached.
Since a new learner is dependent upon the results of previous learner, hence boosting techniques are sequential in nature.
In other words, each model is dependent on the results of its previous model, which can be shown
by Figure 2.10. Its major goal is to correct the errors of the previous model. Final classifier is a
linear combination of these learners.