-
摘要: 假设空间复杂性是统计学习理论中用于分析学习模型泛化能力的关键因素.与数据无关的复杂度不同,Rademacher复杂度是与数据分布相关的,因而通常能得到比传统复杂度更紧致的泛化界表达.近年来,Rademacher复杂度在统计学习理论泛化能力分析的应用发展中起到了重要的作用.鉴于其重要性,本文梳理了各种形式的Rademacher复杂度及其与传统复杂度之间的关联性,并探讨了基于Rademacher复杂度进行学习模型泛化能力分析的基本技巧.考虑样本数据的独立同分布和非独立同分布两种产生环境,总结并分析了Rademacher复杂度在泛化能力分析方面的研究现状.展望了当前Rademacher复杂度在非监督框架与非序列环境等方面研究的不足,及其进一步应用与发展.
-
关键词:
- 机器学习 /
- 统计学习理论 /
- 泛化界 /
- Rademacher 复杂度
Abstract: Measuring the complexity of a hypothesis space plays a crucial role in statistical learning theory. Unlike those data-independent complexities, Rademacher complexity, which is data-dependent, can attain a much more compact generalization representation. In recent years, Rademacher complexity has attracted more attention and found broad applications in the development of statistical learning theory. Because of its importance, in this paper we review several complexity measures of function classes and their relations with Rademacher complexities. Next, we describe the techniques of Rademacher complexities in generalization analysis. Then, we present the recent researches of Rademacher complexity learning bounds for independent and identical distribution (i.i.d.) and non-independent and identical distribution (non-i.i.d.). Finally, we discuss the potential issues and possible directions of Rademacher complexities in statistical learning theory. -
表 1 Rademacher 复杂度及传统复杂度
Table 1 Rademacher complexities and kinds of complexities of function classes
复杂度类型 本数据集
产生环境复杂度名称 传统复杂度 i.i.d./non-
i.i.d.VC 熵,退火 VC 熵,生长函数,VC 维,
覆盖数,伪维度,Fat-shattering维等
经典 Rademacher 复杂度,局部
Rademacher 复杂度Rade-
macher
复杂度i.i.d. Rademacher chaos 复杂度,单模态
Rademacher 复杂度,
多模态 Rademacher 复杂度,Dropout
Rademacher 复杂度non-i.i.d. 独立不同分布 Rademacher 复杂度,
块Rademacher 复杂度,
序列 Rademacher 复杂度表 2 i.i.d.情形的泛化界分析
Table 2 Generalization analysis for i.i.d.
本数据集产生环境 学习策略 假设空间 泛化能力 i.i.d. ERM 1) $\mathcal{X}\rightarrow\{-1,+1\}$ O$O\left( \frac{1}{\sqrt{n}} \right)$ [52],O$\left(\frac{1}{n} \right)$[53-55]等 2) $\mathcal{X}\rightarrow{\bf R}$ O$\left( \frac{1}{n} \right)$[47],$\text{O}{{\left( \frac{{{\log }^{p}}n}{n} \right)}^{1/(2-\alpha )}}$ [28, 32]等 3) ${\bf R}^{d}\rightarrow{\bf R}$ O$O\left( \frac{1}{\sqrt{n}} \right)$ [51] 4) $\mathcal{X}\rightarrow\{-1,+1\}^{L}$ O$O\left( \frac{1}{\sqrt{n}} \right)$ [56],$O\left( \frac{\log n}{n} \right)$[33] 5) 自由样条函数空间 $O\left( {{\left( \frac{\log n}{n} \right)}^{2\alpha /(1+2\alpha )}} \right)$ [31-32]等 6) $\mathcal{X}× R_{s}\rightarrow\mathcal{{\bf R}}$ $O\left( \frac{{{p}^{(k+1)/2}}}{\sqrt{n}}+\frac{1}{\sqrt{n}} \right),O\left( \frac{{{p}^{k+1}}}{\sqrt{n}}+\frac{1}{\sqrt{n}} \right)$[34] 正则化 7) RKHS $O\left(\frac{\sqrt{{\rm{tr}}(G)}}{n}+\frac{1}{\sqrt{n}}\right)$ [48] 8) ${S}^{d}$ $O\left( \frac{1}{\sqrt{n}} \right)$ [36]等 9) ${S}^{d×(md)}$ $O\left( \frac{1}{\sqrt{n}} \right)$ [27, 32]等 SRM 10) 神经网络空间 $O\left(\left(\frac{1}{n}\right)^{\frac{1}{2-\alpha_{p}}}\right)$[30, 32] 表 3 non-i.i.d.情形的泛化界分析
Table 3 Generalization analysis for non-i.i.d.
样本数据集产生环境 假设空间 泛化能力 独立不同分布 1)$\mathcal{X}\rightarrow\mathcal{Y}$ $O\left(\frac{1}{\sqrt{n}}\right)$[37] 平稳分布且$\beta\textrm{-mixing}$ 2) $\mathcal{X}\rightarrow{\bf R}$ $O\left(\frac{\sqrt{{\rm{tr}}(G)}}{n}+\frac{1}{\sqrt{n}}\right)$[38] 非平稳分布且$\beta\textrm{-mixing}$ 3) $\mathcal{X}\rightarrow\mathcal{Y}$ 非平稳性可能导致
不收敛于0[42]鞅 4) $\mathcal{Z}\rightarrow $[-1,1] 一致收敛[39-41] -
[1] Tikhonov A N, Arsenin V Y. Solution of Ill-posed Problems. Washington:Winston and Sons, 1977. [2] Akaike H. A new look at the statistical model identification. IEEE Transactions on Automatic Control, 1974, 19(6):716-723 doi: 10.1109/TAC.1974.1100705 [3] Akaike H. A Bayesian analysis of the minimum AIC procedure. Annals of the Institute of Statistical Mathematics, 1978, 30(1):9-14 doi: 10.1007/BF02480194 [4] Schwarz G. Estimating the dimension of a model. Annals of Statistics, 1978, 6(2):461-464 doi: 10.1214/aos/1176344136 [5] Kolmogorov A N. On tables of random numbers. Sankhya:The Indian Journal of Statistics, 1963, 25:369-376 http://cn.bing.com/academic/profile?id=2319581f805b56782c9de35bf823fe97&encoded=0&v=paper_preview&mkt=zh-cn [6] Kolmogorov A N. Three approaches to the quantitative definition of information. Problems of Information Transmission, 1965, 1(1):1-7 http://cn.bing.com/academic/profile?id=c9935bf10e0666e93d39e022ca502b3d&encoded=0&v=paper_preview&mkt=zh-cn [7] Chaitin G J. On the length of programs for computing finite binary sequences. Journal of the ACM, 1966, 13(4):547-569 doi: 10.1145/321356.321363 [8] Solomonoff R J. A formal theory of inductive inference. Part II. Information and Control, 1964, 7(2):224-254 doi: 10.1016/S0019-9958(64)90131-7 [9] Wallace C S, Boulton D M. An information measure for classification. The Computer Journal, 1968, 11(2):185-194 doi: 10.1093/comjnl/11.2.185 [10] Vapnik V N, Chervonenkis A Y. On the uniform convergence of relative frequencies of events to their probabilities. Theory of Probability and Its Applications, 1971, 16(2):264-280 doi: 10.1137/1116025 [11] Vapnik V N. The Nature of Statistical Learning Theory. New York:Springer, 1995. [12] Vapnik V N. An overview of statistical learning theory. IEEE Transactions on Neural Networks, 1999, 10(5):988-999 doi: 10.1109/72.788640 [13] Popper K R. The Logic of Scientific Discovery. United Kingdom:Hutchinson, 1959. [14] Valiant L G. A theory of the learnable. Communications of the ACM, 1984, 27(11):1134-1142 doi: 10.1145/1968.1972 [15] Kearns M J, Schapire R E. Efficient distribution-free learning of probabilistic concepts. Journal of Computer and System Sciences, 1994, 48(3):464-497 doi: 10.1016/S0022-0000(05)80062-5 [16] Bartlett P L, Long P M, Williamson R C. Fat-shattering and the learnability of real-valued functions. Journal of Computer and System Sciences, 1996, 52(3):434-452 doi: 10.1006/jcss.1996.0033 [17] Shawe-Taylor J, Bartlett P L, Williamson R C, Anthony M. Structural risk minimization over data-dependent hierarchies. IEEE Transactions on Information Theory, 1998, 44(5):1926-1940 doi: 10.1109/18.705570 [18] Koltchinskii V. Rademacher penalties and structural risk minimization. IEEE Transactions on Information Theory, 2001, 47(5):1902-1914 doi: 10.1109/18.930926 [19] Bartlett P L, Mendelson S. Rademacher and Gaussian complexities:risk bounds and structural results. The Journal of Machine Learning Research, 2003, 3:463-482 http://cn.bing.com/academic/profile?id=59b9b1ed8c26e154cc8cc3bb9f31c21f&encoded=0&v=paper_preview&mkt=zh-cn [20] Bartlett P L, Bousquet O, Mendelson S. Local Rademacher complexities. The Annals of Statistics, 2005, 33(4):1497-1537 doi: 10.1214/009053605000000282 [21] Koltchinskii V. Local Rademacher complexities and oracle inequalities in risk minimization. The Annals of Statistics, 2006, 34(6):2593-2656 doi: 10.1214/009053606000001019 [22] Koltchinskii V, Panchenko D. Rademacher processes and bounding the risk of function learning. High Dimensional Probability II. Boston:Birkhäuser, 2004. 443-457 [23] Bousquet O, Koltchinskii V, Panchenko D. Some local measures of complexity of convex hulls and generalization bounds. Computational Learning Theory. Sydney, Australia:Springer, 2004. 59-73 [24] 张海, 徐宗本. 学习理论综述(I):稳定性与泛化性. 工程数学学报, 2008, 25(1):1-9 http://www.cnki.com.cn/Article/CJFDTOTAL-GCSX200801004.htmZhang Hai, Xu Zong-Ben. A survey on learning theory (I):stability and generalization. Chinese Journal of Engineering Mathematics, 2008, 25(1):1-9 http://www.cnki.com.cn/Article/CJFDTOTAL-GCSX200801004.htm [25] 胡政发. 统计量复杂性估计及其在机器学习中的应用. 自动化学报, 2008, 34(10):1332-1336 http://www.aas.net.cn/CN/abstract/abstract17904.shtmlHu Zheng-Fa. The statistic complexity estimates and its application to machine learning. Acta Automatica Sinica, 2008, 34(10):1332-1336 http://www.aas.net.cn/CN/abstract/abstract17904.shtml [26] 陈将宏. Rademacher复杂性与支持向量机的推广性能[硕士学位论文], 湖北大学, 中国, 2005.Chen Jiang-Hong. Rademacher Complexities and the Generalization Performance of SVM[Master dissertation], Hubei University, China, 2005. [27] Lei Y W, Ying Y M. Generalization analysis of multi-modal metric learning[Online], available:https://www.research-gate.net/publication/282969709, November 3, 2015 [28] Lei Y W, Ding L X, Bi Y Z. Local Rademacher complexity bounds based on covering numbers[Online], available:http://arxiv.org/pdf/1510.01463v1.pdf, October 6, 2015 [29] Lei Y W, Ding L X. Refined rademacher chaos complexity bounds with applications to the multikernel learning problem. Neural Computation, 2014, 26(4):739-760 doi: 10.1162/NECO_a_00566 [30] Lei Y W, Ding L X, Zhang W S. Generalization performance of radial basis function networks. IEEE Transactions on Neural Networks and Learning Systems, 2015, 26(3):551-564 doi: 10.1109/TNNLS.2014.2320280 [31] Lei Y W, Ding L X, Wu W L. Universal learning using free multivariate splines. Neurocomputing, 2013, 119(16):253-263 [32] 雷云文. Rademacher复杂度及其在机器学习中的应用[博士学位论文], 武汉大学, 中国, 2015.Lei Yun-Wen. Rademacher Complexities with Applications to Machine Learning[Ph.D. dissertation], Wuhan University, China, 2015. [33] Xu C, Liu T L, Tao D C, Xu C. Local Rademacher complexity for multi-label learning[Online], available:http://arxiv.org/pdf/1410.6990.pdf, October 26, 2014 [34] Gao W, Zhou Z H. Dropout Rademacher complexity of deep neural networks. Science China:Information Sciences, 2016, 59(7):072104:1-072104:12 doi: 10.1007/s11432-015-5470-z [35] de la Peña V, Giné E. Decoupling:From Dependence to Independence. New York:Springer-Verlag, 1999. [36] Cao Q, Guo Z C, Ying Y M. Generalization bounds for metric and similarity learning. Machine Learning, 2016, 102(1):115-132 doi: 10.1007/s10994-015-5499-7 [37] Mohri M, Medina A M. New analysis and algorithm for learning with drifting distributions. In:Proceedings of the 23rd International Conference on Algorithmic Learning Theory. Lyon, France:Springer-Verlag, 2012. 124-138 [38] Mohri M, Rostamizadeh A. Rademacher complexity bounds for non-I.I.D. processes. In:Proceedings of the 2008 Advances in Neural Information Processing Systems 21. Vancouver, BC, Canada:Curran Associates, Inc., 2008. 1097-1104 [39] Rakhlin A, Sridharan K. On martingale extensions of Vapnik-Chervonenkis theory with applications to online learning. Measures of Complexity. Switzerland:Springer International Publishing, 2015. 197-215 [40] Rakhlin A, Sridharan K, Tewari A. Sequential complexities and uniform martingale laws of large numbers. Probability Theory and Related Fields, 2015, 161(1-2):111-153 doi: 10.1007/s00440-013-0545-5 [41] Rakhlin A, Sridharan K, Tewari A. Online learning:random averages, combinatorial parameters, and learnability. In:Proceedings of the 2010 Advances in Neural Information Processing Systems 23. Vancouver, BC, Canada:Curran Associates, Inc., 2010. 1984-1992 [42] Kuznetsov V, Mohri M. Generalization bounds for time series prediction with non-stationary processes. In:Proceedings of the 25th International Conference on Algorithmic Learning Theory. Bled, Slovenia:Springer International Publishing, 2014. 260-274 [43] Kuznetsov V, Mohri M. Forecasting non-stationary time series:from theory to algorithms[Online], available:http://www.cims.nyu.edu/~vitaly/pub/fts.pdf, July 12, 2016 [44] Anguita D, Ghio A, Oneto L, Ridella S. A deep connection between the Vapnik-Chervonenkis entropy and the Rademacher complexity. IEEE Transactions on Neural Networks and Learning Systems, 2014, 25(12):2202-2211 doi: 10.1109/TNNLS.2014.2307359 [45] Kääriäinen M. Relating the Rademacher and VC Bounds, Technical Report, C-2004-57, Department of Computer Science, University of Helsinki, Finland, 2004. [46] Srebro N, Sridharan K. Note on refined dudley integral covering number bound[Online], available:http://ttic.uchicago.edu/~karthik/dudley.pdf, July 12, 2016 [47] Srebro N, Sridharan K, Tewari A. Smoothness, low noise and fast rates. In:Proceedings of the 2010 Advances in Neural Information Processing Systems 23. Vancouver, BC, Canada:Curran Associates, Inc., 2010. 2199-2207 [48] V'Yugin V V. VC dimension, fat-shattering dimension, Rademacher averages, and their applications. Measures of Complexity. Switzerland:Springer International Publishing, 2015. 57-74 [49] Ying Y M, Campbell C. Rademacher chaos complexities for learning the kernel problem. Neural Computation, 2010, 22(11):2858-2886 doi: 10.1162/NECO_a_00028 [50] Massart P. Some applications of concentration inequalities to statistics. Annales De La Faculté Des Sciences De Toulouse, 2000, 9(2):245-303 doi: 10.5802/afst.961 [51] Gnecco G, Sanguineti M. Approximation error bounds via Rademacher's complexity. Applied Mathematical Sciences, 2008, 2(4):153-176 [52] Boucheron S, Bousquet O, Lugosi G. Theory of classification:a survey of some recent advances. ESAIM:Probability and Statistics, 2005, 9:323-375 doi: 10.1051/ps:2005018 [53] Oneto L, Ghio A, Ridella S, Anguita D. Global Rademacher complexity bounds:from slow to fast convergence rates. Neural Processing Letters, 2016, 43(2):567-602 doi: 10.1007/s11063-015-9429-2 [54] Oneto L, Ghio A, Ridella S, Anguita D. Fast convergence of extended Rademacher complexity bounds. In:Proceedings of the 2015 International Joint Conference on Neural Networks. Killarney, Ireland:IEEE, 2015. 1-10 [55] Oneto L, Ghio A, Anguita D, Ridella S. An improved analysis of the Rademacher data-dependent bound using its self bounding property. Neural Networks, 2013, 44:107-111 doi: 10.1016/j.neunet.2013.03.017 [56] Yu H F, Jain P, Kar P, Dhillon I S. Large-scale multi-label learning with missing labels. In:Proceedings of the 31st International Conference on Machine Learning. Beijing, China, 2014. 593-601 [57] 丁万鼎. 测度论概要. 合肥:安徽人民出版社, 2005.Ding Wan-Ding. Essentials of Measure Theory. Hefei:Anhui People's Publishing House, 2005. [58] Hastie T, Friedman J, Tibshirani R. The Elements of Statistical Learning:Data Mining, Inference, and Prediction. New York:Springer, 2001. [59] van der Vaart A W, Wellner J A. Weak Convergence and Empirical Processes. New York:Springer-Verlag, 1996. [60] Ledoux M, Talagrand M. Probability in Banach Spaces:Isoperimetry and Processes. Berlin:Springer-Verlag, 1991. [61] Dudley R M. The sizes of compact subsets of Hilbert space and continuity of Gaussian processes. Journal of Functional Analysis, 1967, 1(3):290-330 doi: 10.1016/0022-1236(67)90017-1 [62] Lozano F. Model selection using Rademacher penalization. In:Proceedings of the 2nd ICSC Symposium on Neural Networks. London:Academic Press, 2000. [63] Boucheron S, Lugosi G, Massart P. Concentration Inequalities:A Nonasymptotic Theory of Independence. United Kingdom:Oxford University Press, 2013. [64] 周志华. 机器学习. 北京:清化大学出版社, 2016.Zhou Zhi-Hua. Machine Learning. Beijing:Tsinghua University Press, 2016. [65] Mohri M, Rostamizadeh A, Talwalkar A. Foundations of Machine Learning. Cambridge:MIT Press, 2012. [66] 汪洪桥, 孙富春, 蔡艳宁, 陈宁, 丁林阁. 多核学习方法. 自动化学报, 2010, 36(8):1037-1050 doi: 10.3724/SP.J.1004.2010.01037Wang Hong-Qiao, Sun Fu-Chun, Cai Yan-Ning, Chen Ning, Ding Lin-Ge. On multiple kernel learning methods. Acta Automatica Sinica, 2010, 36(8):1037-1050 doi: 10.3724/SP.J.1004.2010.01037 [67] McFee B, Lanckriet G. Learning multi-modal similarity. Journal of Machine Learning Research, 2011, 12:491-523 http://cn.bing.com/academic/profile?id=938f4935300ae26d31169f6fc0730139&encoded=0&v=paper_preview&mkt=zh-cn [68] Xie P T, Xing E P. Multi-modal distance metric learning. In:Proceedings of the 23rd International Joint Conference on Artificial Intelligence. Beijing, China:AAAI, 2013. 1806-1812 [69] Hinton G E, Srivastava N, Krizhevsky A, Sutskever I, Salakhutdinov R R. Improving neural networks by preventing co-adaptation of feature detectors[Online], available:http://arxiv.org/pdf/1207.0580v1.pdf, July 3, 2012 [70] Wan L, Zeiler M, Zhang S X, LeCun Y, Fergus R. Regularization of neural networks using dropconnect. In:Proceedings of the 30th International Conference on Machine Learning. Atlanta, Georgia, USA, 2013. 1058-1066 [71] Rudin W. Functional Analysis (2nd edition). New York:McGraw-Hill, 1991. [72] Mendelson S. A few notes on statistical learning theory. Advanced Lectures on Machine Learning. Berlin Heidelberg:Springer, 2003. 1-40 [73] Sauer N. On the density of families of sets. Journal of Combinatorial Theory, Series A, 1972, 13(1):145-147 doi: 10.1016/0097-3165(72)90019-2 [74] Pollard D. Convergence of Stochastic Processes. New York:Springer-Verlag, 1984. [75] Duan H H. Bounding the fat shattering dimension of a composition function class built using a continuous logic connective[Online], available:http://arxiv.org/pdf/1105.4618v1. pdf, May 23, 2011 [76] Anthony M, Bartlett P L. Neural Network Learning-Theoretical Foundations. Cambridge:Cambridge University Press, 2009. [77] Alon N, Ben-David S, Cesa-Bianchi N, Haussler D. Scale-sensitive dimensions, uniform convergence, and learnability. Journal of the ACM, 1997, 44(4):615-631 doi: 10.1145/263867.263927 [78] Kolmogorov A N, Tihomirov V M. ε-entropy and ε-capacity of sets in functional space. American Mathematical Society Translations, 1961, 17(2):277-364 http://citeseerx.ist.psu.edu/showciting?cid=139921 [79] Bousquet O. New approaches to statistical learning theory. Journal of the Institute of Statistical Mathematics, 2003, 55(2):371-389 http://cn.bing.com/academic/profile?id=672bc751662466966b4cbbca41bef961&encoded=0&v=paper_preview&mkt=zh-cn [80] Wu P C, Hoi S C H, Xia H, Zhao P L, Wang D Y, Miao C Y. Online multimodal deep similarity learning with application to image retrieval. In:Proceedings of the 21st ACM International Conference on Multimedia. New York, USA:ACM, 2013. 153-162 [81] Xia H, Wu P C, Hoi S C H. Online multi-modal distance learning for scalable multimedia retrieval. In:Proceedings of the 6th International Conference on Web Search and Data Mining. New York, USA:ACM, 2013. 455-464 [82] Krzyzak A, Linder T. Radial basis function networks and complexity regularization in function learning. IEEE Transactions on Neural Networks, 1998, 9(2):247-256 doi: 10.1109/72.661120 [83] Niyogi P, Girosi F. On the relationship between generalization error, hypothesis complexity, and sample complexity for radial basis functions. Neural Computation, 1996, 8(4):819-842 doi: 10.1162/neco.1996.8.4.819 [84] Park J, Sandberg I W. Universal approximation using radial-basis-function networks. Neural Computation, 1991, 3(2):246-257 doi: 10.1162/neco.1991.3.2.246 [85] Girosi F. Approximation error bounds that use VC-bounds[Online], available:https://www.researchgate.net/public-ation/2782224_Approximation_Error_Bounds_That_Use_Vc-Bounds, February 18, 2013 [86] Györfi L, Kohler M, Krzyżak A, Walk H. A Distribution-Free Theory of Nonparametric Regression. Berlin:Springer-Verlag, 2010. [87] Haussler D. Decision theoretic generalizations of the PAC model for neural net and other learning applications. Information and Computation, 1992, 100(1):78-150 doi: 10.1016/0890-5401(92)90010-D [88] Yu B. Rates of convergence for empirical processes of stationary mixing sequences. The Annals of Probability, 1994, 22(1):94-116 doi: 10.1214/aop/1176988849 [89] Meir R. Nonparametric time series prediction through adaptive model selection. Machine Learning, 2000, 39(1):5-34 doi: 10.1023/A:1007602715810 [90] Bartlett P L. Learning with a slowly changing distribution. In:Proceedings of the 5th Annual Workshop on Computational Learning Theory. New York, USA:ACM, 1992. 243-252 [91] Mansour Y, Mohri M, Rostamizadeh A. Domain adaptation:learning bounds and algorithms. In:Proceedings of the 22nd Annual Conference on Learning Theory. Montreal, QC:Morgan Kaufmann Publishers, 2009. [92] Anguita D, Ghio A, Oneto L, Ridella S. In-sample and out-of-sample model selection and error estimation for support vector machines. IEEE Transactions on Neural Networks and Learning Systems, 2012, 23(9):1390-1406 doi: 10.1109/TNNLS.2012.2202401 [93] El-Yaniv R, Pechyony D. Transductive Rademacher complexity and its applications. Journal of Artificial Intelligence Research, 2007, 35:193-234 [94] Tolstikhin I, Blanchard G, Kloft M. Localized complexities for transductive learning[Online], available:http://arxiv. org/pdf/1411.7200.pdf, November 26, 2014 [95] Tolstikhin I, Zhivotovskiy N, Blanchard G. Permutational Rademacher complexity. Algorithmic Learning Theory. Switzerland:Springer International Publishing, 2015. 209-223 [96] Chazottes J R, Collet P, Külske C, Redig F. Concentration inequalities for random fields via coupling. Probability Theory and Related Fields, 2007, 137(1):201-225 [97] Belomestny D, Spokoiny V. Concentration inequalities for smooth random fields. Theory of Probability and Its Applications, 2014, 58(2):314-323 doi: 10.1137/S0040585X9798659X [98] Zhu X J, Rogers T T, Gibson B R. Human Rademacher complexity. In:Proceedings of the 2009 Advances in Neural Information Processing Systems 22. Vancouver, BC, Canada:Curran Associates, Inc., 2009. 2322-2330 [99] Vahdat M, Oneto L, Ghio A, Anguita D, Funk M, Rauterberg M. Human algorithmic stability and human Radema-cher complexity. In:Proceedings of the 23rd European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning. Bruges, Belgium:Katholieke Universiteit Leuven, 2015. 313-318 [100] Nolan D, Pollard D. U-processes:rates of convergence. The Annals of Statistics, 1987, 15(2):780-799 doi: 10.1214/aos/1176350374 [101] Karlin S, Taylor H M. A First Course in Stochastic Processes (2nd edition). New York:Academic Press, 1975.