2.845

2023影响因子

(CJCR)

  • 中文核心
  • EI
  • 中国科技核心
  • Scopus
  • CSCD
  • 英国科学文摘

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

用于不平衡数据分类的0阶TSK型模糊系统

顾晓清 蒋亦樟 王士同

顾晓清, 蒋亦樟, 王士同. 用于不平衡数据分类的0阶TSK型模糊系统. 自动化学报, 2017, 43(10): 1773-1788. doi: 10.16383/j.aas.2017.c160200
引用本文: 顾晓清, 蒋亦樟, 王士同. 用于不平衡数据分类的0阶TSK型模糊系统. 自动化学报, 2017, 43(10): 1773-1788. doi: 10.16383/j.aas.2017.c160200
GU Xiao-Qing, JIANG Yi-Zhang, WANG Shi-Tong. Zero-order TSK-type Fuzzy System for Imbalanced Data Classification. ACTA AUTOMATICA SINICA, 2017, 43(10): 1773-1788. doi: 10.16383/j.aas.2017.c160200
Citation: GU Xiao-Qing, JIANG Yi-Zhang, WANG Shi-Tong. Zero-order TSK-type Fuzzy System for Imbalanced Data Classification. ACTA AUTOMATICA SINICA, 2017, 43(10): 1773-1788. doi: 10.16383/j.aas.2017.c160200

用于不平衡数据分类的0阶TSK型模糊系统

doi: 10.16383/j.aas.2017.c160200
基金项目: 

国家自然科学基金 61572236

中央高校基本科研业务费专项资金资助项目 JUSRP51614A

江苏省自然科学基金资助 BK20160187

国家自然科学基金 61502058

国家自然科学基金 61572085

详细信息
    作者简介:

    蒋亦樟  江南大学数字媒体学院讲师.2016年获得江南大学数字媒体学院博士学位.主要研究方向为人工智能, 模式识别, 模糊系统.E-mail:s101914015@vip.jiangnan.edu.cn

    王士同  江南大学数字媒体学院教授.主要研究方向为人工智能, 神经网络, 模式识别.E-mail:wxwangst@aliyun.com

    通讯作者:

    顾晓清  常州大学信息科学与工程学院讲师.江南大学数字媒体学院博士研究生.主要研究方向为模式识别, 机器学习.本文通信作者.E-mail:czxqgu@163.com

Zero-order TSK-type Fuzzy System for Imbalanced Data Classification

Funds: 

National Natural Science Foundation of China 61572236

Fundamental Research Funds for the Central Universities JUSRP51614A

Natural Science Foundation of Jiangsu Province under Grant BK20160187

National Natural Science Foundation of China 61502058

National Natural Science Foundation of China 61572085

More Information
    Author Bio:

      Lecturer at the School of Information Science and Engineering. He received his Ph. D. degree from the School of Digital Media, Jiangnan University in 2016. His research interest covers artificial intelligence, pattern recognition, and fuzzy system

      Professor at the School of Digital Media, Jiangnan University. His research interest covers artificial intelligence, neuro-fuzzy systems, and pattern recognition

    Corresponding author: GU Xiao-Qing   Lecturer at the School of Information Science and Engineering, Changzhou University. Ph. D. candidate at the School of Digital Media, Jiangnan University. Her research interest covers pattern recognition and machine learning. Corresponding author of this paper.E-mail:czxqgu@163.com
  • 摘要: 处理不平衡数据分类时,传统模糊系统对少数类样本识别率较低.针对这一问题,首先,在前件参数学习上,提出了竞争贝叶斯模糊聚类(Bayesian fuzzy clustering based on competitive learning,BFCCL)算法,BFCCL算法考虑不同类别样本聚类中心间的排斥作用,采用交替迭代的执行方式并通过马尔科夫蒙特卡洛方法获得模型参数最优解.其次,在后件参数学习上,基于大间隔的策略并通过参数调节使得少数类到分类面的距离大于多数类到分类面的距离,该方法能有效纠正分类面的偏移.基于上述思想以0阶TSK型模糊系统为具体研究对象构造了适用于不平衡数据分类问题的0阶TSK型模糊系统(0-TSK-IDC).人工和真实医学数据集实验结果表明,0-TSK-IDC在不平衡数据分类问题中对少数类和多数类均具有较高的识别率,且具有良好的鲁棒性和可解释性.
    1)  本文责任编委 王立威
  • 图  1  BFCCL聚类的构造原理示意图

    Fig.  1  The principle of BFCCL clustering

    图  2  BFCCL参数学习示意图

    Fig.  2  The parameter learning strategy of BFCCL

    图  3  0-TSK-IDC分类面示意图

    Fig.  3  The classification hyperplane of 0-TSK-IDC

    图  4  BFC在Banana集上正负类聚类数均为3时的聚类效果

    Fig.  4  The clustering results on the Banana dataset in BFC with three clustering on the positive and negative classes, respectively

    图  5  BFCCL在Banana集上正负类聚类数均为3时的聚类效果

    Fig.  5  The clustering results on the Banana dataset in BFCCL with three clustering on the positive and negative classes, respectively

    图  6  BFC在Banana集上正负类聚类数均为4时的聚类效果

    Fig.  6  The clustering results on the Banana dataset in BFC with four clustering on the positive and negative classes, respectively

    图  7  BFCCL在Banana集上正负类聚类数均为4时的聚类效果

    Fig.  7  The clustering results on the Banana dataset in BFCCL with four clustering on the positive and negative classes, respectively

    图  8  0-TSK-IDC基于图 7(b)聚类结果的所获得模糊集示意图

    Fig.  8  A plot of rulebase of 0-TSK-IDC from the clustering result in Fig. 7(b)

    图  9  UCI医学集上不同算法的G-mean比较

    Fig.  9  G-mean and its standard deviation comparison of 0-TSK-IDC and other algorithms on UCI dataset

    图  10  UCI医学集上不同算法的F-measure比较

    Fig.  10  F-measure and its standard deviation comparison of 0-TSK-IDC and other algorithms on UCI dataset

    图  11  UCI医学集上G-mean随规则数变化的示意图

    Fig.  11  G-mean with the different fuzzy rules on UCI databases

    表  1  数据集的基本信息

    Table  1  The basic information of datasets

    数据集 正类样本数 负类样本数 正负类比例 属性个数
    Banana dataset 600, 200, 100 1 500 2 : 5, 2 : 15, 1 : 15 2
    Heart statlog 120, 60, 30, 20, 12 170 12 : 17, 6 : 17, 3 : 17, 2 : 17, 6 : 85 13
    Breast wisconsin 241, 200, 150, 100, 40 458 241 : 458, 100 : 229, 75 : 229, 50 : 229, 20 : 229 10
    Liver disorders 145, 100, 50, 20 200 29 : 40, 1 : 2, 1 : 4, 1 : 10 7
    Haberman 81, 40, 25 225 9 : 25, 8 : 45, 1 : 9 3
    下载: 导出CSV

    表  3  UCI医学集上分别使用BFCCL与BFC得到规则前件时0-TSK-IDC模糊系统中的G-mean和F-measure值及其方差的比较

    Table  3  G-mean, F-measure and their standard deviations comparison of 0-TSK-IDC with BFC and BFCCL on UCI datasets

    数据集 正负类比例 BFC BFCCL
    G-mean (%) F-measure (%) G-mean (%) F-measure (%)
    Heart 12:17 87.01(1.69) 86.87(1.78) 89.56(1.91) 89.36(1.90)
    6:17 86.24(2.00) 85.48(1.88) 88.14(1.89) 87.88(1.92)
    3:17 85.41(1.97) 84.08(2.00) 87.29(2.01) 86.40(2.00)
    2:17 82.50(2.30) 80.02(2.31) 85.71(2.24) 83.63(2.23)
    6:85 81.05(2.17) 75.37(2.19) 84.65(2.11) 78.50(2.04)
    Breast 241:458 93.62(2.60) 91.46(2.57) 96.56(2.34) 95.03(2.20)
    100:229 91.14(2.05) 90.02(2.55) 95.59(1.97) 94.24(2.01)
    75:229 90.37(2.00) 89.14(2.04) 93.75(1.88) 91.22(1.89)
    50:229 87.99(1.90) 85.73(1.89) 91.59(2.03) 89.29(2.10)
    20:229 84.21(2.23) 81.26(2.21) 87.56(2.00) 86.05(1.99)
    Liver 29:40 70.38(0.80) 66.28(0.82) 72.50(0.77) 68.51(0.76)
    1:2 69.77(0.75) 61.27(0.75) 71.15(0.69) 62.50(0.60)
    1:4 67.82(0.79) 52.98(0.78) 70.24(0.73) 55.22(0.79)
    Haberman 1:10 65.08(0.81) 47.31(0.83) 67.18(0.75) 50.65(0.72)
    9:25 76.05(1.73) 52.75(1.73) 76.56(1.60) 53.61(1.60)
    8:45 68.02(1.86) 51.09(1.85) 68.97(1.85) 52.60(1.87)
    1:9 64.21(1.69) 48.22(1.73) 65.42(1.74) 50.01(1.69)
    下载: 导出CSV

    表  2  Banana集上基于BFC与BFCCL 图 4~7聚类结果的0-TSK-IDC模糊系统中的G-mean和F-measure及其方差的比较

    Table  2  G-mean, F-measure and their standard deviations comparison of 0-TSK-IDC with the clustering results in Fig. 4~7 by using the BFC and BFCCL on the Banana dataset

    规则数 正负类比例 BFC BFCCL
    G-mean (%) F-measure (%) G-mean (%) F-measure (%)
    6 2:5 96.48(0.60) 96.2(0.64) 96.97(0.53) 96.44(0.54)
    2:15 95.89(0.52) 95.77(0.49) 96.20(0.31) 96.22(0.36)
    1:15 94.14(0.47) 93.79(0.41) 95.45(0.55) 94.99(0.48)
    8 2:5 97.98(0.31) 97.23(0.34) 99.75(0.27) 99.74(0.23)
    2:15 97.03(0.29) 96.92(0.29) 99.32(0.34) 99.32(0.35)
    1:15 96.76(0.36) 96.65(0.32) 98.68(0.30) 98.63(0.33)
    下载: 导出CSV

    表  4  Banana数据集上0-TSK-IDC模糊分类器与其他算法的G-mean和F-measure值及其方差的比较

    Table  4  G-mean, F-measure and their standard deviations comparison of 0-TSK-IDC and other algorithms on the Banana dataset

    正负类样本数 算法 G-mean (%) F-measure (%)
    2:5 FS-FCSVM 95.90(0.89) 95.31(0.84)
    L2-TSK-FS 96.26(0.47) 95.48(0.45)
    BFCCL-TSK-FS 96.79(0.50) 96.14(0.41)
    Adaboost 98.77(0.87) 98.71(0.90)
    CS-SVM 98.98(0.33) 98.91(0.30)
    0-TSK-IDC 99.75(0.27) 99.74(0.23)
    2:15 FS-FCSVM 90.53(0.65) 89.46(0.60)
    L2-TSK-FS 89.23(0.71) 88.47(0.72)
    BFCCL-TSK-FS 92.70(0.58) 92.26(0.62)
    Adaboost 97.92(0.64) 97.75(0.68)
    CS-SVM 98.22(0.37) 98.05(0.36)
    0-TSK-IDC 99.32(0.34) 99.32(0.35)
    1:15 FS-FCSVM 86.06(0.81) 82.83(0.84)
    L2-TSK-FS 87.95(0.55) 84.67(0.54)
    BFCCL-TSK-FS 88.84(0.43) 86.33(0.49)
    Adaboost 97.46(0.58) 97.28(0.52)
    CS-SVM 97.79(0.74) 97.61(0.70)
    0-TSK-IDC 98.68(0.30) 98.63(0.33)
    下载: 导出CSV
  • [1] Richardson J, Korniak J, Reiner P D, Wilamowski B M. Nearest-neighbor spline approximation (NNSA) improvement to TSK fuzzy systems. IEEE Transactions on Industrial Informatics, 2016, 12(1):169-178 doi: 10.1109/TII.2015.2499122
    [2] Deng Z H, Cao L B, Jiang Y Z, Wang S T. Minimax probability TSK fuzzy system classifier:a more transparent and highly interpretable classification model. IEEE Transactions on Fuzzy Systems, 2015, 23(4):813-826 doi: 10.1109/TFUZZ.2014.2328014
    [3] 贾立, 杨爱华, 邱铭森.基于多信号源的神经模糊HammersteinWiener模型研究.自动化学报, 2013, 39(5):690-696 http://www.aas.net.cn/CN/abstract/abstract17931.shtml

    Jia Li, Yang Ai-Hua, Qiu Ming-Sen. Research on multisignal based neuro-fuzzy Hammerstein-Wiener model. Acta Automatica Sinica, 2013, 39(5):690-696 http://www.aas.net.cn/CN/abstract/abstract17931.shtml
    [4] Liu Y J, Tong S C, Chen C L P, Li D J. Neural controller design-based adaptive control for nonlinear MIMO systems with unknown hysteresis inputs. IEEE Transactions on Cybernetics, 2016, 46(1):9-19 doi: 10.1109/TCYB.2015.2388582
    [5] Cheng W Y, Juang C F. A fuzzy model with online incremental SVM and margin-selective gradient descent learning for classification problems. IEEE Transactions on Fuzzy Systems, 2014, 22(2):324-337 doi: 10.1109/TFUZZ.2013.2254492
    [6] Jiang Y Z, Chung F L, Ishibuchi H, Deng Z H, Wang S T. Multitask TSK fuzzy system modeling by mining intertask common hidden structure. IEEE Transactions on Cybernetics, 2015, 45(3):534-547 doi: 10.1109/TCYB.2014.2330844
    [7] Liu Y J, Tong S C. Adaptive fuzzy control for a class of unknown nonlinear dynamical systems. Fuzzy Sets and Systems, 2015, 263:49-70 doi: 10.1016/j.fss.2014.08.008
    [8] Wong S Y, Yap K S, Yap H J, Tan S C, Chang S W. On equivalence of FIS and ELM for interpretable rule-based knowledge representation. IEEE Transactions on Neural Networks and Learning Systems, 2015, 26(7):1417-1430 doi: 10.1109/TNNLS.2014.2341655
    [9] Leski J M. TSK-fuzzy modeling based on ε-insensitive learning. IEEE Transactions on Fuzzy Systems, 2005, 13(2):181-193 doi: 10.1109/TFUZZ.2004.840094
    [10] Leski J M. Fuzzy (c+p)-means clustering and its application to a fuzzy rule-based classifier:toward good generalization and good interpretability. IEEE Transactions on Fuzzy Systems, 2015, 23(4):802-812 doi: 10.1109/TFUZZ.2014.2327995
    [11] Fernández A, del Jesus M J, Herrera F. On the 2-tuples based genetic tuning performance for fuzzy rule based classification systems in imbalanced data-sets. Information Sciences, 2010, 180(8):1268-1291 doi: 10.1016/j.ins.2009.12.014
    [12] Fernández A, del Jesus M, Herrera F. Hierarchical fuzzy rule based classification systems with genetic rule selection for imbalanced data-sets. International Journal of Approximate Reasoning, 2009, 50(3):561-577 doi: 10.1016/j.ijar.2008.11.004
    [13] Ramentol E, Caballero Y, Bello R, Herrera F. SMOTERSB*:a hybrid preprocessing approach based on oversampling and undersampling for high imbalanced data-Sets using SMOTE and rough sets theory. Knowledge and Information Systems, 2012, 33(2):245-265 doi: 10.1007/s10115-011-0465-6
    [14] López V, Fernández A, del Jesus M, Herrera F. A hierarchical genetic fuzzy system based on genetic programming for addressing classification with highly imbalanced and borderline datasets. Knowledge Based Systems, 2013, 38:85-104 doi: 10.1016/j.knosys.2012.08.025
    [15] Galar M, Fernández A, Barrenechea E, Herrera F. EUSBoost:enhancing ensembles for highly imbalanced data-sets by evolutionary undersampling. Pattern Recognition, 2013, 46(12):3460-3471 doi: 10.1016/j.patcog.2013.05.006
    [16] Chawla N V, Bowyer K W, Hall L O, Kegelmeyer W P. SMOTE:synthetic minority over-sampling technique. Journal of Artificial Intelligence Research, 2002, 16(1):321-357 https://medium.com/erinludertblog/smote-synthetic-minority-over-sampling-technique-caada3df2c0a
    [17] Sun Y M, Kamel M S, Wong A K C, Wang Y. Costsensitive boosting for classification of imbalanced data. Pattern Recognition, 2007, 40(12):3358-3378 doi: 10.1016/j.patcog.2007.04.009
    [18] Tang Y C, Zhang Y Q, Chawla N V, Krasser S. SVMs modeling for highly imbalanced classification. IEEE Transactions on Systems, Man, and Cybernetics, Part B:Cybernetics, 2009, 39(1) 281-288 doi: 10.1109/TSMCB.2008.2002909
    [19] Deng Z H, Jiang Y Z, Chung F L, Ishibuchi H, Wang S T. Knowledge-leverage-based fuzzy system and its modeling. IEEE Transactions on Fuzzy Systems, 2013, 21(4):597-609 doi: 10.1109/TFUZZ.2012.2212444
    [20] Zhu L, Chung F L, Wang S T. Generalized fuzzy C-means clustering algorithm with improved fuzzy partitions. IEEE Transactions on Systems, Man, and Cybernetics, Part B:Cybernetics, 2009, 39(3):578-591 doi: 10.1109/TSMCB.2008.2004818
    [21] Deng Z H, Choi K S, Chung F L, Wang S T. Enhanced soft subspace clustering integrating within-cluster and betweencluster information. Pattern Recognition, 2010, 43(3):767-781 doi: 10.1016/j.patcog.2009.09.010
    [22] Glenn T C, Zare A, Gader P D. Bayesian fuzzy clustering. IEEE Transactions on Fuzzy Systems, 2015, 23(5):1545-1561 doi: 10.1109/TFUZZ.2014.2370676
    [23] 蒋亦樟, 邓赵红, 王士同. ML型迁移学习模糊系统.自动化学报, 2012, 38(9):1393-1409 http://www.aas.net.cn/CN/abstract/abstract17749.shtml

    Jiang Yi-Zhang, Deng Zhao-Hong, Wang Shi-Tong. Mamdani-Larsen type transfer learning fuzzy system. Acta Automatica Sinica, 2012, 38(9):1393-1409 http://www.aas.net.cn/CN/abstract/abstract17749.shtml
    [24] Azeem M F, Hanmandlu M, Ahmad N. Generalization of adaptive neuro-fuzzy inference systems. IEEE Transactions on Neural Networks, 2000, 11(6):1332-1346 doi: 10.1109/72.883438
    [25] Deng Z H, Choi K S, Chung F L, Wang S T. Scalable TSK fuzzy modeling for very large datasets using minimalenclosing-ball approximation. IEEE Transactions on Fuzzy Systems, 2011, 19(2):210-226 doi: 10.1109/TFUZZ.2010.2091961
    [26] Hall L O, Goldgof D B. Convergence of the single-pass and online fuzzy C-means algorithms. IEEE Transactions on Fuzzy Systems, 2011, 19(4):792-794 doi: 10.1109/TFUZZ.2011.2143418
    [27] Meyn S P, Tweedie R L. Markov Chains and Stochastic Stability. London:Springer, 1993.
    [28] Nesterov Y. Introductory Lectures on Convex Optimization:A Basic Course. US:Springer, 2004.
    [29] Vapnik V N. Statistical Learning Theory. New York:Wiley, 1998.
    [30] Ni T G, Chung F L, Wang S T. Support vector machine with manifold regularization and partially labeling privacy protection. Information Sciences, 2015, 294:390-407 doi: 10.1016/j.ins.2014.09.050
    [31] UCI database[Online], available:http://www.ics.uci.edu/.
    [32] Juang C F, Chiu S H, Shiu S J. Fuzzy system learned through fuzzy clustering and support vector machine for human skin color segmentation. IEEE Transactions on Systems, Man, and Cybernetics-Part A:Systems and Humans, 2007, 37(6):1077-1087 doi: 10.1109/TSMCA.2007.904579
    [33] Wang S, Yao X. Multiclass imbalance problems:analysis and potential solutions. IEEE Transactions on Systems, Man, and Cybernetics, Part B:Cybernetics, 2012, 42(4):1119-1130 doi: 10.1109/TSMCB.2012.2187280
    [34] Masnadi-Shirazi H, Vasconcelos N, Iranmehr A. Costsensitive support vector machines. Journal of Machine Learning Research, 2012, arXiv:1212.0975 http://en.cnki.com.cn/article_en/cjfdtotal-kzyc200604024.htm
    [35] Bezdek J C. A physical interpretation of fuzzy ISODATA. IEEE Transactions on Systems, Man, and Cybernetics, 1976, SMC-6(5):387-389 doi: 10.1109/TSMC.1976.4309506
    [36] Sun Z B, Song Q B, Zhu X Y, Sun H L, Xu B W, Zhou Y M. A novel ensemble method for classifying imbalanced data. Pattern Recognition, 2015, 48(5):1623-1637 doi: 10.1016/j.patcog.2014.11.014
    [37] Parambath S A P, Usunier N, Grandvalet Y. Optimizing F-measures by cost-sensitive classification. In:Proceedings of Advances in Neural Information Processing Systems 27. Montreal, Canada:Curran Associates, Inc., 2014. 2123-2131
  • 加载中
图(11) / 表(4)
计量
  • 文章访问数:  2553
  • HTML全文浏览量:  322
  • PDF下载量:  644
  • 被引次数: 0
出版历程
  • 收稿日期:  2016-02-29
  • 录用日期:  2016-08-02
  • 刊出日期:  2017-10-20

目录

    /

    返回文章
    返回