Advance and Prospects of AdaBoost Algorithm
-
摘要: AdaBoost是最优秀的Boosting算法之一, 有着坚实的理论基础, 在实践中得到了很好的推广和应用. 算法能够将比随机猜测略好的弱分类器提升为分类精度高的强分类器, 为学习算法的设计提供了新的思想和新的方法. 本文首先介绍Boosting猜想提出以及被证实的过程, 在此基础上, 引出AdaBoost算法的起源与最初设计思想;接着, 介绍AdaBoost算法训练误差与泛化误差分析方法, 解释了算法能够提高学习精度的原因;然后, 分析了AdaBoost算法的不同理论分析模型, 以及从这些模型衍生出的变种算法;之后, 介绍AdaBoost算法从二分类到多分类的推广. 同时, 介绍了AdaBoost及其变种算法在实际问题中的应用情况. 本文围绕AdaBoost及其变种算法来介绍在集成学习中有着重要地位的Boosting理论, 探讨Boosting理论研究的发展过程以及未来的研究方向, 为相关研究人员提供一些有用的线索. 最后,对今后研究进行了展望, 对于推导更紧致的泛化误差界、多分类问题中的弱分类器条件、更适合多分类问题的损失函数、 更精确的迭代停止条件、提高算法抗噪声能力以及从子分类器的多样性角度优化AdaBoost算法等问题值得进一步深入与完善.Abstract: AdaBoost is one of the most excellent Boosting algorithms. It has a solid theoretical basis and has made great success in practical applications. AdaBoost can boost a weak learning algorithm with an accuracy slightly better than random guessing into an arbitrarily accurate strong learning algorithm, bringing about a new method and a new design idea to the design of learning algorithm. This paper first introduces how Boosting, just a conjecture when proposed, was proved right, and how this proof led to the origin of AdaBoost algorithm. Second, training and generalization error of AdaBoost are analyzed to explain why AdaBoost can successfully improve the accuracy of a weak learning algorithm. Third, different theoretical models to analyze AdaBoost are given. Meanwhile, many variants derived from these models are presented. Fourth, extensions of binary-class AdaBoost to multiclass AdaBoost are described. Besides, applications of AdaBoost algorithm are also introduced. Finally, interested directions which need to be further studied are discussed. For Boosting theory, these directions include deducing a tighter generalization error bound and figuring out a more precise weak learning condition in a multiclass problem. For AdaBoost, the stopping conditions, the way to enhance anti-noise capability and how to improve the accuracy by optimizing the diversity of the base classifiers, are good questions to be in-depth researched.
-
Key words:
- Ensemble learning /
- Boosting /
- AdaBoost /
- generalization error /
- classification margin /
- multiclass classification
-
[1] Zhou Z H, Yang Y, Wu X D, Kumar V. The Top Ten Algorithms in Data Mining. New York, USA: CRC Press, 2009, 127-149 [2] Valiant L G. A theory of the learnable. Communications of the ACM, 1984, 27(11): 1134-1142 [3] Kearns M, Valiant L. Cryptographic limitations on learning boolean formulae and finite automata. Journal of the ACM, 1994, 41(1): 67-95 [4] Schapire R E. The strength of weak learnability. Machine Learning, 1990, 5(2): 197-227 [5] Littlestone N, Warmuth M K. The weighted majority algorithm. Information and Computation, 1994, 108(2): 212-261 [6] Freund Y, Schapire R E. A decision-theoretic generalization of on-line learning and an application to Boosting. Journal of Computer and System Sciences, 1997, 55(1): 119-139 [7] Freund Y, Schapire R E. Experiments with a new Boosting algorithm. In: Proceedings of the 13th Conference on Machine Learning. San Francisco, USA: Morgan Kaufmann, 1996. 148-156 [8] Schapire R E, Singer Y. Improved Boosting algorithms using confidence-rated predictions. Machine Learning, 1999, 37(3): 297-336 [9] Li Hang. Methods of Statistical Learning. Beijing: Tsinghua University Press, 2012. 143 (李航. 统计学习方法. 北京: 清华大学出版社, 2012. 143) [10] Quinlan J R. Bagging, Boosting, and C4.5. In: Proceedings of the 13th National Conference on Artificial Intelligence. California, USA: AAAI Press, 1996. 725-730 [11] Breiman L. Arcing classifier (with discussion and a rejoinder by the author). Annals of Statistics, 1998, 26(3): 801-849 [12] Bartlett P, Freund Y, Lee W S, Schapire R E. Boosting the margin: a new explanation for the effectiveness of voting methods. Annals of Statistics, 1998, 26(5): 1651-1686 [13] Breiman L. Bagging predictors. Machine Learning, 1996, 24(2): 123-140 [14] Blumer A, Ehrenfeucht A, Haussler D, Warmuth M K. Occam's razor. Information Processing Letters, 1987, 24(6): 377-380 [15] Schapire R E. Theoretical views of Boosting. In: Proceedings of the 4th European Conference on Computational Learning Theory. Berlin, Germany: Springer-Verlarg, 1999. 1-10 [16] Breiman L. Prediction games and arcing algorithms. Neural Computation, 1999, 11(7): 1493-1517 [17] Grove A J, Schuurmans D. Boosting in the limit: maximizing the margin of learned ensembles. In: Proceedings of the 15th National/10th Conference on Artificial Intelligence/Innovative Applications of Artificial Intelligence. San Francisco, USA: John Wiley & Sons, 1998. 692-699 [18] Koltchinskii V, Panchenko D. Empirical margin distributions and bounding the generalization error of combined classifiers. Annals of Statistics, 2002, 30(1): 1-50 [19] Rátsch G, Warmuth M K. Maximizing the margin with Boosting. In: Proceedings of the 15th Annual Conference on Computational Learning Theory. Berlin, Germany: Springer-Verlarg, 2002. 334-350 [20] Koltchinskii V, Panchenko D. Complexities of convex combinations and bounding the generalization error in classification. Annals of Statistics, 2005, 33(4): 1455-1496 [21] Reyzin L, Schapire R E. How Boosting the margin can also boost classifier complexity. In: Proceedings of the 23rd International Conference on Machine Learning. New York, USA: ACM, 2006. 753-760 [22] Wang L W, Sugiyama M, Yang C, Zhou Z H, Feng J F. On the margin explanation of Boosting algorithms. In: Proceedings of the 21st Annual Conference on Learning Theory. Madison, USA: Omni Press, 2008. 479-490 [23] Feng J F, Wang L W, Sugiyama M, Yang C, Zhou Z H, Zhang C C. Boosting and margin theory. Frontiers of Electrical and Electronic Engineering, 2012, 7(1): 127-133 [24] Mason L, Bartlett P L, Baxter J. Direct optimization of margins improves generalization in combined classifiers. In: Proceedings of the 1998 Conference on Advances in Neural Information Processing Systems. Cambridge, MA: The MIT Press, 1999. 288-294 [25] Mason L, Bartlett P L, Baxter J. Improved generalization through explicit optimization of margins. Machine Learning, 2000, 38(3): 243-255 [26] Shen C H, Li H X. Boosting through optimization of margin distributions. IEEE Transactions on Neural Networks, 2010, 21(4): 659-666 [27] Geurts P. Bias vs variance decomposition for regression and classification. Data Mining and Knowledge Discovery Handbook. New York, USA: Springer, 2005. 749-763 [28] Friedman J H. On bias, variance, 0/1-loss, and the curse-of-dimensionality. Data Mining and Knowledge Discovery, 1997, 1(1): 55-77 [29] Wheway V. Variance reduction trends on "boosted" classifiers. Journal of Applied Mathematics & Decision Sciences, 2004, 8(3): 141-154 [30] Friedman J, Hastie T, Tibshirani R. Additive logistic regression: a statistical view of Boosting (with discussion and a rejoinder by the authors). Annals of Statistics, 2000, 28(2): 337-407 [31] Mason L, Baxter J, Bartlett P L, Frean M. Functional gradient techniques for combining hypotheses. Advances in Large Margin Classifiers. Cambridge, MA: MIT Press, 2000. 221-247 [32] Mason L, Baxter J, Bartlett P, Frean M. Boosting algorithms as gradient descent. Advances in Neural Information Processing Systems, 1999, 12: 512-518 [33] Duffy N, Helmbold D. Potential boosters. Advances in Neural Information Processing Systems, 1999, 12: 258-264 [34] Friedman J H. Greedy function approximation: a gradient boosting machine. Annals of Statistics, 2001, 29(5): 1189-1232 [35] Friedman J, Hastie T, Tibshirani R. The Elements of Statistical Learning. New York, USA: Springer, 2001. 337-384 [36] Zheng S F, Liu W X. Functional gradient ascent for probit regression. Pattern Recognition, 2012, 45(12): 4428-4437 [37] Collins M, Schapire R E, Singer Y. Logistic regression, AdaBoost and Bregman distances. Machine Learning, 2002, 48(1-3): 253-285 [38] Warmuth M K, Liao J, Rátsch G. Totally corrective Boosting algorithms that maximize the margin. In: Proceedings of the 23rd International Conference on Machine Learning. New York, USA: ACM, 2006. 1001-1008 [39] Kivinen J, Warmuth M K. Boosting as entropy projection. In: Proceedings of the 12th Annual Conference on Computational Learning Theory. New York, USA: ACM, 1999. 134-144 [40] Wang P, Shen C H, Barnes N, Zheng H, Ren Z. Asymmetric totally-corrective Boosting for real-time object detection. In: Proceedings of the 10th Asian conference on Computer Vision. Berlin, Heidelberg: Springer-Verlag, 2011. 176-188 [41] Shen C H, Hao Z H. A direct formulation for totally-corrective multi-class Boosting. In: Proceedings of the 2001 IEEE Conference on Computer Vision and Pattern Recognition. New York, USA: IEEE, 2011. 2585-2592 [42] Hao Z H, Shen C H, Barnes N, Wang B. Totally-corrective multi-class Boosting. Computer Vision-ACCV 2010, 2011, 6495: 269-280 [43] Demiriz A, Bennett K P, Shawe-Taylor J. Linear programming Boosting via column generation. Machine Learning, 2002, 46(1-3): 225-254 [44] Li H X, Shen C H. Boosting the minimum margin: LPBoost vs. AdaBoost. In: Proceedings of the 2008 Digital Image Computing: Techniques and Applications. New York, USA: IEEE, 2008. 533-539 [45] Warmuth M K, Glocer K A, Vishwanathan S V N. Entropy regularized LPBoost. In: Proceedings of the 2008 Algorithmic Learning Theory: 19th International Conference. Berlin, Germany: Springer-Verlarg, 2008. 256-271 [46] Warmuth M K, Glocer K A, Rátsch G. Boosting algorithms for maximizing the soft margin. Advances in Neural Information Processing Systems, 2007, 20: 1585-1592 [47] Freund Y, Schapire R E. Game theory, on-line prediction and Boosting. In: Proceedings of the 9th Annual Conference on Computational Learning Theory. New York, USA: ACM, 1996. 325-332 [48] Freund Y, Schapire R E. Adaptive game playing using multiplicative weights. Games and Economic Behavior, 1999, 29(1-2): 79-103 [49] Rezek I, Leslie D S, Reece S, Roberts S J, Rogers A, Dash R K, Jennings N R. On similarities between inference in game theory and machine learning. Journal of Artificial Intelligence Research, 2008, 33(1): 259-283 [50] Hinton G E. Training products of experts by minimizing contrastive divergence. Neural Computation, 2002, 14(8): 1771-1800 [51] Edakunni N U, Brown G, Kovacs T. Boosting as a product of experts. In: Proceedings of the 27th Conference on Uncertainty in Artificial Intelligence. Virginia, USA: AUAI Press, 2011. 187-194 [52] Dietterich T G, Bakiri G. Solving multiclass learning problems via error-correcting output codes. Journal of Artificial Intelligence Research, 1995, 2(1): 263-286 [53] Bautista M Á, Escalera S, Baró ó X, Radeva P, Vitriá J, Pujol O. Minimal design of error-correcting output codes. Pattern Recognition Letters, 2012, 33(6): 693-702 [54] Guruswami V, Sahai A. Multiclass learning, Boosting, and error-correcting codes. In: Proceedings of the 12th Annual Conference on Computational Learning Theory. New York, USA: ACM, 1999. 145-155 [55] Li L. Multiclass Boosting with repartitioning. In: Proceedings of the 23rd International Conference on Machine Learning. New York, USA: ACM, 2006. 569-576 [56] Sun Y J, Todorovic S, Li J. Unifying multi-class AdaBoost algorithms with binary base learners under the margin framework. Pattern Recognition Letters, 2007, 28(5): 631-643 [57] Gao T S, Koller D. Multiclass Boosting with hinge loss based on output coding. In: Proceedings of the 28th International Conference on Machine Learning. New York, USA: ACM, 2011. 569-576 [58] Wang Y Y, Chen S C, Xue H. Can under-exploited structure of original-classes help ECOC-based multi-class classification? Neurocomputing, 2012, 85: 158-167 [59] Allwein E L, Schapire R E, Singer Y. Reducing multiclass to binary: a unifying approach for margin classifiers. Journal of Machine Learning Research, 2001, 1: 113-141 [60] Kumar S, Ghosh J, Crawford M M. Hierarchical fusion of multiple classifiers for hyperspectral data analysis. Pattern Analysis & Applications, 2002, 5(2): 210-220 [61] Jun G, Ghosh J. Multi-class Boosting with class hierarchies. In: Proceedings of the 8th International Workshop on Multiple Classifier Systems. Berlin, Germany: Springer-Verlarg, 2009. 32-41 [62] Mukherjee I, Schapire R E. A theory of multiclass Boosting. Advances in Neural Information Processing Systems, 2010, 23: 1714-1722 [63] Schwenk H, Bengio Y. Training methods for adaptive Boosting of neural networks for character recognition. Advances in Neural Information Processing Systems, 1998, 10: 647-653 [64] Howe N R, Rath T M, Manmatha R. Boosted decision trees for word recognition in handwritten document retrieval. In: Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. New York, USA: ACM, 2005. 377-383 [65] Peng X J, Setlur S, Govindaraju V, Ramachandrula S. Using a boosted tree classifier for text segmentation in hand-annotated documents. Pattern Recognition Letters, 2012, 33(7): 943-950 [66] Hu J, Chen Y B. Writer-independent off-line handwritten signature verification based on real AdaBoost. In: Proceedings of the 2nd International Conference on Artificial Intelligence, Management Science and Electronic Commerce. New York, USA: IEEE, 2011. 6095-6098 [67] Viola P, Jones M J. Robust real-time face detection. International Journal of Computer Vision, 2004, 57(2): 137-154 [68] Yang Zhi-Guang, Ai Hai-Zhou. Cluster-based face image retrieval and its relevance feedback. Acta Automatica Sinica, 2008, 34(9): 1033-1039 (杨之光, 艾海舟. 基于聚类的人脸图像检索及相关反馈. 自动化学报, 2008, 34(9): 1033-1039) [69] Sun H L, Shen J, Chen B. LBP based fast face recognition system on symbian platform. AASRI Procedia, 2012, 1: 276-281 [70] Lei Z, Li S Z. Fast multi-scale local phase quantization histogram for face recognition. Pattern Recognition Letters, 2012, 33(13): 1761-1767 [71] Ren J F, Jiang X D, Yuan J S. A complete and fully automated face verification system on mobile devices. Pattern Recognition, 2013, 46(1): 45-56 [72] Wang Q, Zhang X D, Li M Q, Dong X P, Zhou Q H, Yin Y. AdaBoost and multi-orientation 2D gabor-based noisy iris recognition. Pattern Recognition Letters, 2012, 33(8): 978-983 [73] Ren S K, Hou Y X, Zhang P, Liang X R. Importance weighted AdaRank. In: Proceedings of the 7th International Conference on Advanced Intelligent Computing. Berlin, Heidelberg: Springer-Verlag, 2012, 6838: 448-455 [74] Lee J J, Lee P H, Lee S W, Yuille A, Koch C. AdaBoost for text detection in natural scene. In: Proceedings of the 11th International Conference on Document Analysis and Recognition. New York, USA: IEEE, 2011. 429-434 [75] Schapire R E, Singer Y. BoosTexter: a Boosting-based system for text categorization. Machine Learning, 2000, 39(2-3): 135-168 [76] Bloehdorn S, Hotho A. Boosting for text classification with semantic features. Advances in Web Mining and Web Usage Analysis, 2006, 3932: 149-166 [77] Batal I, Hauskrecht M. Boosting KNN text classification accuracy by using supervised term weighting schemes. In: Proceedings of the 18th ACM Conference on Information and Knowledge Management. New York, USA: ACM, 2009. 2041-2044 [78] Wu X, Ngo C W, Zhu Y M, Peng Q. Boosting web video categorization with contextual information from social web. World Wide Web, 2012, 15(2): 197-212 [79] Landesa-Vázquez I, Alba-Castro J L. Shedding light on the asymmetric learning capability of AdaBoost. Pattern Recognition Letters, 2011, 33(3): 247-255 [80] Masnadi-Shirazi H, Vasconcelos N. Cost-sensitive Boosting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2011, 33(2): 294-309 [81] Ge J F, Luo Y P. A comprehensive study for asymmetric AdaBoost and its application in object detection. Acta Automatica Sinica, 2009, 35(11): 1403-1409 [82] Sun Y M, Kamel M S, Wong A K C, Wang Y. Cost-sensitive Boosting for classification of imbalanced data. Pattern Recognition, 2007, 40(12): 3358-3378 [83] Jan T K, Wang D W, Lin C H, Lin H T. A simple methodology for soft cost-sensitive classification. In: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York, USA: ACM, 2012. 141-149 [84] Yan X S, Luo Y P. Recognizing human actions using a new descriptor based on spatial temporal interest points and weighted output classifier. Neurocomputing, 2012, 87: 51-61 [85] Saon G, Soltau H. Boosting systems for large vocabulary continuous speech recognition. Speech Communication, 2012, 54(2): 212-218 [86] Park J, Diehl F, Gales M J F, Tomalin M, Woodland P C. The efficient incorporation of MLP features into automatic speech recognition systems. Computer Speech & Language, 2011, 25(3): 519-534 [87] Chen S, Wang J Q, Ouyang Y, Wang B, Xu C S, Lu H Q. Boosting part-sense multi-feature learners toward effective object detection. Computer Vision and Image Understanding, 2011, 115(3): 364-374 [88] Rios-Cabrera R, Tuytelaars T, Van Gool L. Efficient multi-camera vehicle detection, tracking, and identification in a tunnel surveillance application. Computer Vision and Image Understanding, 2012, 116(6): 742-753 [89] Cheng Hao, Huang Lei, Liu Chang-Ping, Tan Nu-Tao. A two-level video text localization algorithm based on strokes and AdaBoost. Acta Automatica Sinica, 2008, 34(10): 1312-1318 (程豪, 黄磊, 刘昌平, 谭怒涛. 基于笔画和AdaBoost的两层视频文字定位算法. 自动化学报, 2008, {\bf 34}(10): 1312-1318) [90] Nizamani S, Memon N, Wiil U K. Detection of illegitimate emails using Boosting algorithm. Counter Terrorism and Open Source Intelligence, 2011, 2: 249-264 [91] Ramanathan V, Wechsler H. Phishing website detection using latent dirichlet allocation and AdaBoost. In: Proceedings of the 2012 IEEE International Conference on Intelligence and Security Informatics. New York, USA: IEEE, 2012. 102-107 [92] Goli B, Nair A S. The elusive short gene---an ensemble method for recognition for prokaryotic genome. Biochemical and Biophysical Research Communications, 2012, 422(1): 36-41 [93] Mukherjee I, Rudin C, Schapire R E. The rate of convergence of AdaBoost. In: Proceedings of the 24th Annual Conference on Learning Theory. Budapest, Hungary: JMLR, 2011. 559-594 [94] Cao J J, Kwong S, Wang R. A noise-detection based AdaBoost algorithm for mislabeled data. Pattern Recognition, 2012, 45(12): 4451-4465 [95] Bylander T, Tate L. Using validation sets to avoid overfitting in AdaBoost. In: Proceedings of the 19th International FLAIRS Conference. California, USA: AAAI Press, 2006. 544-549 [96] Mahmood A, Khan S. Early terminating algorithms for AdaBoost based detectors. In: Proceedings of the 16th IEEE International Conference on Image Processing. New York, USA: IEEE, 2009. 1209-1212 [97] Chang Y C I, Huang Y F, Huang Y P. Early stopping in L2 Boosting. Computational Statistics & Data Analysis, 2010, 54(10): 2203-2213 [98] Meddouri N, Khoufi H, Maddouri M S. Diversity analysis on boosting nominal concepts. Advance in Computational Intelligence, 2012, 7301: 306-317 [99] Thammasiri D, Meesad P. AdaBoost ensemble data classification based on diversity of classifiers. Advanced Materials Research, 2012, 403-408: 3682-3687
点击查看大图
计量
- 文章访问数: 4546
- HTML全文浏览量: 301
- PDF下载量: 6364
- 被引次数: 0