2.765

2022影响因子

(CJCR)

  • 中文核心
  • EI
  • 中国科技核心
  • Scopus
  • CSCD
  • 英国科学文摘

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

基于样本条件价值改进的 Co-training 算法

程圣军 刘家锋 黄庆成 唐降龙

程圣军, 刘家锋, 黄庆成, 唐降龙. 基于样本条件价值改进的 Co-training 算法. 自动化学报, 2013, 39(10): 1665-1673. doi: 10.3724/SP.J.1004.2013.01665
引用本文: 程圣军, 刘家锋, 黄庆成, 唐降龙. 基于样本条件价值改进的 Co-training 算法. 自动化学报, 2013, 39(10): 1665-1673. doi: 10.3724/SP.J.1004.2013.01665
CHENG Sheng-Jun, LIU Jia-Feng, HUANG Qing-Cheng, TANG Xiang-Long. Conditional Value-based Co-training. ACTA AUTOMATICA SINICA, 2013, 39(10): 1665-1673. doi: 10.3724/SP.J.1004.2013.01665
Citation: CHENG Sheng-Jun, LIU Jia-Feng, HUANG Qing-Cheng, TANG Xiang-Long. Conditional Value-based Co-training. ACTA AUTOMATICA SINICA, 2013, 39(10): 1665-1673. doi: 10.3724/SP.J.1004.2013.01665

基于样本条件价值改进的 Co-training 算法

doi: 10.3724/SP.J.1004.2013.01665
基金项目: 

国家自然科学基金(61173087, 61073128), 黑龙江省自然科学基金(F201021)资助

详细信息
    作者简介:

    刘家锋 哈尔滨工业大学计算机科学与技术学院副教授.主要研究方向为模式识别,机器学习,图像处理,图像理解与机器视觉.E-mail:jefferyliu@hit.edu.cn

Conditional Value-based Co-training

Funds: 

Supported by National Natural Science Foundation of China (61173087, 61073128), Natural Science Foundation of Heilongjiang Province (F201021)

  • 摘要: Co-training是一种主流的半监督学习算法. 该算法中两视图下的分类器通过迭代的方式, 互为对方从无标记样本集中挑选新增样本, 以更新对方训练集. Co-training以分类器的后验概率输出作为新增样本的挑选策略, 该策略忽略了样本对于当前分类器的价值. 针对该问题, 本文提出一种改进的Co-training式算法—CVCOT (Conditional value-based co-training), 即采用基于样本条件价值的挑选策略来优化Co-training. 通过定义无标记样本的条件价值, 各视图下的分类器以样本条件价值为依据来挑选新增样本, 以此更新训练集. 该策略既可保证新增样本的标记可靠性, 又能优先将价值较高的富信息样本补充到训练集中, 可以有效地优化分类器. 在UCI数据集和网页分类应用上的实验结果表明: CVCOT具有较好的分类性能和学习效率.
  • [1] Chapelle O, Schölkopf B, Zien A. Semi-Supervised Learning. Cambridge, MA: MIT Press, 2006
    [2] Blum A, Mitchell T. Combining labeled and unlabeled data with co-training. In: Proceedings of the 11th Annual Conference on Computational Learning Theory. Wisconsin, MI: ACM, 1998. 92-100
    [3] Zhu X J. Semi-supervised Learning Literature Survey, Computer Science Technical Report 1530. University of Wisconsin Madison, USA, 2008
    [4] Pierce D, Cardie C. Limitations of co-training for natural language learning from large datasets. In: Proceedings of the 2001 Conference on Empirical Methods in Natural Language Processing. Pittsburgh, PA, 2001. 1-9
    [5] Steedman M, Osborne M, Sarkar A, Clark S, Hwa R, Hockenmaier J, Ruhlen P, Baker S, Crim J. Bootstrapping statistical parsers from small datasets. In: Proceedings of the 11th Conference of the European Chapter of the Association for Computational Linguistics. Budapest, Hungary: Association for Computational Linguistics Stroudsburg, 2003. 331-338
    [6] Li M, Li H, Zhou Z H. Semi-supervised document retrieval. Information Processing and Management, 2009, 45(3): 341-355
    [7] Li M, Zhou Z H. Improve computer-aided diagnosis with machine learning techniques using undiagnosed samples. IEEE Transactions on Systems, Man, and Cybernetics—Part A: Systems and Humans, 2007, 37(6): 1088-1098
    [8] Mavroeidis D, Chaidos K, Pirillos S, Vazirgiannis M. Using tri-training and support vector machines for addressing the ECML-PKDD 2006 discovery challenge. In: Proceedings of the 2006 ECML-PKDD Discovery Challenge Workshop. Berlin, Germany, 2006. 39-47
    [9] Settles B. Active Learning Literature Survey, Computer Science Technical Report 1648, University of Wisconsin-Madison, USA, 2009
    [10] Singh A, Nowak R D, Zhu X J. Unlabeled data: now it helps, now it doesn't. Advances in Neural Information Processing Systems. Cambridge: MIT Press, 2008. 1513-1520
    [11] Dasgupta S, Littman M L, McAllester D. PAC generalization bounds for co-training. Advances in Neural Information Processing Systems. Cambridge: MIT Press, 2001. 375-382
    [12] Balcan M, Blum A, Yang K. Co-training and expansion: towards bridging theory and practice. Advances in Neural Information Processing Systems. Cambridge: MIT Press, 2005. 89-96
    [13] Wang W, Zhou Z H. A new analysis of co-training. In: Proceedings of the 27th International Conference on Machine Learning. Haifa, Israel, 2010. 1135-1142
    [14] Du J, Ling C X, Zhou Z H. When does cotraining work in real data? IEEE Transactions on Knowledge and Data Engineering, 2011, 23(5): 788-799
    [15] Nigam K, Ghani R. Analyzing the effectiveness and applicability of co-training. In: Proceedings of the 9th ACM International Conference on Information and Knowledge Management. McLean, VA: ACM, 2000. 86-93
    [16] Zhou Z H, Li M. Semi-supervised learning by disagreement. Knowledge and Information Systems, 2010, 24(3): 415-439
    [17] Goldman S A, Zhou Y. Enhancing supervised learning with unlabeled data. In: Proceedings of the 17th International Conference on Machine Learning. San Francisco, CA: Morgan Kaufmann Publishers Inc, 2000. 327-334
    [18] Zhou Z H, Li M. Tri-training: exploiting unlabeled data using three classifiers. IEEE Transactions on Knowledge and Data Engineering, 2005, 17(11): 1529-1541
    [19] Li M, Zhou Z H. SETRED: self-training with editing. In: Proceedings of the 9th Pacific-Asia Conference on Knowledge Discovery and Data Mining. Hanoi, Vietnam: Springer-Verlag, 2005. 611-621
    [20] Deng Cao, Guo Mao-Zu. ADE-Tri-training: tri-training with adaptive data editing. Chinese Journal of Computers, 2007, 30(8): 1213-1226 (邓超, 郭茂祖. 基于自适应数据剪辑策略的Tri-training算法. 计算机学报, 2007, 30(8): 1213-1226)
    [21] Zhang M L, Zhou Z H. CoTrade: confident co-training with data editing. IEEE Transactions on Systems, Man, and Cybernetics—Part B: Cybernetics, 2011, 41(6): 1612-1626
    [22] Chen Rong, Cao Yong-Feng, Sun Hong. Multi-class image classification with active learning and semi-supervised learning. Acta Automatica Sinica, 2011, 37(8): 954-962 (陈荣, 曹永锋, 孙洪. 基于主动学习和半监督学习的多类图像分类. 自动化学报, 2011, 37(8): 954-962)
    [23] MaCallum A, Nigam K. Employing EM in pool-based active learning for text classification. In: Proceedings of the 15th International Conference on Machine Learning. San Francisco: Morgan Kaufmann, 1998. 350-358
    [24] Muslea I, Minton S, Knoblock C A. Active+Semi-supervised learning=Robust multi-view learning. In: Proceedings of the 19th International Conference on Machine Learning. Sydney, Australia: Morgan Kaufmann Publishers Inc, 2002. 435-442
    [25] Muslea I, Minton S, Knoblock C A. Active learning with multiple views. Journal of Artificial Intelligence Research, 2006, 27(1): 203-233
    [26] Zhou Z H, Chen K J, Dai H B. Enhancing relevance feedback in image retrieval using unlabeled data. ACM Transactions on Information Systems, 2006, 24(2): 219-244
    [27] Li M, Zhang H Y, Wu R X, Zhou Z H. Sample-based software defect prediction with active and semi-supervised learning. Automated Software Engineering, 2012, 19(2): 201-230
    [28] Yarowsky D. Unsupervised word sense disambiguation rivaling supervised methods. In: Proceedings of the 33rd Annual Meeting of the Association for Computational Linguistics. Stroudsburg, PA: Association for Computational Linguistics, 1995. 189-196
    [29] Lewis D D, Gale W A. A sequential algorithm for training text classifiers. In: Proceedings of the 17th annual international ACM SIGIR Conference on Research and Development in Information Retrieval. New York, NY: Springer-Verlag, 1994. 3-12
  • 加载中
计量
  • 文章访问数:  1997
  • HTML全文浏览量:  71
  • PDF下载量:  1140
  • 被引次数: 0
出版历程
  • 收稿日期:  2012-05-08
  • 修回日期:  2012-08-02
  • 刊出日期:  2013-10-20

目录

    /

    返回文章
    返回