2.845

2023影响因子

(CJCR)

  • 中文核心
  • EI
  • 中国科技核心
  • Scopus
  • CSCD
  • 英国科学文摘

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

基于多源的跨领域数据分类快速新算法

顾鑫 王士同 许敏

顾鑫, 王士同, 许敏. 基于多源的跨领域数据分类快速新算法. 自动化学报, 2014, 40(3): 531-547. doi: 10.3724/SP.J.1004.2014.00531
引用本文: 顾鑫, 王士同, 许敏. 基于多源的跨领域数据分类快速新算法. 自动化学报, 2014, 40(3): 531-547. doi: 10.3724/SP.J.1004.2014.00531
GU Xin, WANG Shi-Tong, XU Min. A New Cross-multidomain Classification Algorithm and Its Fast Version for Large Datasets. ACTA AUTOMATICA SINICA, 2014, 40(3): 531-547. doi: 10.3724/SP.J.1004.2014.00531
Citation: GU Xin, WANG Shi-Tong, XU Min. A New Cross-multidomain Classification Algorithm and Its Fast Version for Large Datasets. ACTA AUTOMATICA SINICA, 2014, 40(3): 531-547. doi: 10.3724/SP.J.1004.2014.00531

基于多源的跨领域数据分类快速新算法

doi: 10.3724/SP.J.1004.2014.00531
基金项目: 

国家自然科学基金(60903100,60975027)资助

详细信息
    作者简介:

    王士同 教授, 中国计算机学会高级会员. 主要研究方向为人工智能, 模式识别, 数据挖掘, 神经网络, 模糊系统, 医学图像处理和生物信息学.E-mail:wxwangst@yahoo.com.cn

    通讯作者:

    顾鑫

A New Cross-multidomain Classification Algorithm and Its Fast Version for Large Datasets

Funds: 

Supported by National Natural Science Foundation of China (60903100, 60975027)

  • 摘要: 研究跨领域学习与分类是为了将对多源域的有监督学习结果有效地迁移至目标域,实现对目标域的无标记分 类. 当前的跨领域学习一般侧重于对单一源域到目标域的学习,且样本规模普遍较小,此类方法领域自适应性较差,面对 大样本数据更显得无能为力,从而直接影响跨域学习的分类精度与效率. 为了尽可能多地利用相关领域的有用数据,本文 提出了一种多源跨领域分类算法(Multiple sources cross-domain classification,MSCC),该算法依据被众多实验证明有效的罗杰斯特回归模型与一致性方法构建多个源域分类器并综合指导目标域的数据分类. 为了充分高效利用大样本的 源域数据,满足大样本的快速运算,在MSCC的基础上,本文结合最新的CDdual (Dual coordinate descent method)算 法,提出了算法MSCC的快速算法MSCC-CDdual,并进行了相关的理论分析. 人工数据集、文本数据集与图像数据集的实 验运行结果表明,该算法对于大样本数据集有着较高的分类精度、快速的运行速度和较高的领域自适应性. 本文的主要贡 献体现在三个方面:1)针对多源跨领域分类提出了一种新的一致性方法,该方法有利于将MSCC算法发展为MSCC-CDdual快速算法;2)提出了MSCC-CDdual快速算法,该算法既适用于样本较少的数据集又适用于大样本数据集;3) MSCC-CDdual 算法在高维数据集上相比其他算法展现了其独特的优势.
  • [1] Yang J, Yan R, Hauptmann A G. Cross-domain video concept detection using adaptive SVMs. In: Proceedings of the 15th International Conference on Multimedia. New York, USA: ACM, 2007. 188-197
    [2] [2] Blitzer J, McDonald R, Pereira F. Domain adaptation with structural correspondence learning. In: Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing. Stroudsburg, PA: ACL, 2006. 120-128
    [3] [3] Pan S J, Tsang I W H, Kwok J T Y, Yang Q. Domain adaptation via transfer component analysis. IEEE Transactions on Neural Networks, 2011, 22(2): 199-210
    [4] [4] Dai W Y, Yang Q, Xue G R, Yu Y. Boosting for transfer learning. In: Proceedings of the 24th International Conference on Machine Learning. New York, USA: ACM, 2007. 193-200
    [5] [5] Dai W Y, Xue G R, Yang Q, Yu Y. Co-clustering based classification for out-of-domain documents. In: Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining. New York, USA: ACM, 2007. 210-219
    [6] [6] Xing D K, Dai W Y, Xue G R, Yu Y. Bridged refinement for transfer learning. In: Proceedings of the 11th European Conference Practice of Knowledge Discovery in Databases. Berlin: Springer, 2007. 324-335
    [7] [7] Suzuki T, Sugiyama M, Tanaka T. Mutual information approximation via maximum likelihood estimation of density ratio. In: Proceedings of the 2009 IEEE international conference on Symposium on Information Theory. NJ, USA: IEEE, 2009. 463-467
    [8] [8] Suzuki T, Sugiyama M, Sese J, Kanamori T. Approximating mutual information by maximum likelihood density ratio estimation. In: Proceedings of the JMLR: Workshop and Conference Proceedings. NJ, USA: IEEE, 2008. 4: 5-20
    [9] [9] Zhuang F Z, Luo P, Xiong H, Xiong Y H, He Q, Shi Z Z. Cross-domain learning from multiple sources: a consensus regularization perspective. IEEE Transactions on Knowledge and Data Engineering, 2010, 22(12): 1664-1678
    [10] Bollegala D, Weir D, Carroll J. Using multiple sources to construct a sentiment sensitive thesaurus for cross-domain sentiment classification. In: HLT'11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics. Stroudsburg, PA: ACL, 2011. 132-141
    [11] Hosmer D W, Lemeshow S. Applied Logistic Regression. Hoboken, NJ: John Wiley Sons Press, 2001
    [12] Cal D, Condorelli A, Papa S, Rata M, Zagarella L. Improving intelligence through use of natural language processing. A comparison between NLP interfaces and traditional visual GIS interfaces. Procedia Computer Science, 2011, 21(5): 920-925
    [13] Yu H F, Huang F L, Lin C J. Dual coordinate descent methods for logistic regression and maximum entropy models. Machine Learning, 2011, 85(1-2): 41-75
    [14] Gauvain J L, Lee C H. Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains. IEEE Transactions on Speech and Audio Processing, 1994, 2(2): 291-298
    [15] Ruszczynski A. Nonlinear Optimization. Princeton, NJ: Princeton University Press, 2006
    [16] Keerthi S S, Duan K B, Shevade S K, Poo A N. A fast dual algorithm for kernel logistic regression. Machine Learning, 2005, 61(1-3): 151-165
    [17] Joachims T. Making large-scale support vector machine learning practical. Advances in Kernel Methods: Support Vector Learning. Cambridge, MA: MIT Press, 1999. 169-184
    [18] Collobert P, Sinz P, Weston P, Bottou L. Large scale transductive SVMs. The Journal of Machine Learning Research, 2006, 7: 1687-1712
    [19] Joachims T. Transductive inference for text classification using support vector machines. In: Proceedings of the 16th International Conference on Machine Learning. San Francisco, CA: Morgan Kaufmann, 1999. 200-209
    [20] Joachims T. Transductive learning via spectral graph partitioning. In: Proceedings of the 20th International Conference on Machine Learning. New York, USA: ACM, 2003. 290-297
    [21] Chapelle O, Zien A. Semi-supervised classification by low density separation. In: Proceedings of the 10th International Workshop on Artificial Intelligence and Statistics. San Francisco, CA: Morgan Kaufmann 2005. 57-64
    [22] Chapelle O, Chi M M, Zien A. A continuation method for semi-supervised SVMs. In: Proceedings of the 23rd International Conference on Machine Learning. New York, USA: ACM, 2006. 185-192
    [23] Lin C J, Weng R C, Keerthi S S. Trust region Newton method for large-scale logistic regression. Journal of Machine Learning Research, 2008, 9(4): 627-650
    [24] Deng W B. A limited memory quasi-Newton method for large scale problem. Numerical Mathematics, 1996, 5(1): 71-79
    [25] Zhang Lei. The Research on Human-computer Cooperation in Content-based Image Retrieval [Ph.D. dissertation], Tsinghua University, China, 2001 (张磊. 基于人机交互的内容图像检索研究 [博士论文]. 清华大学, 中国, 2001)
    [26] Shi Z P, Ye F, He Q, Shi Z Z. Symmetrical invariant LBP texture descriptor and application for image retrieval. In: Proceedings of the 2008 Congress on Image and Signal Processing. Sanya, China: IEEE Computer Society, 2008. 825-829
  • 加载中
计量
  • 文章访问数:  1980
  • HTML全文浏览量:  92
  • PDF下载量:  1428
  • 被引次数: 0
出版历程
  • 收稿日期:  2012-06-25
  • 修回日期:  2013-02-04
  • 刊出日期:  2014-03-20

目录

    /

    返回文章
    返回