一种改进的特征子集区分度评价准则

谢娟英; 吴肇中; 郑清泉; 王明钊

doi:10.16383/j.aas.c200704

一种改进的特征子集区分度评价准则

doi: 10.16383/j.aas.c200704

谢娟英^1,,
吴肇中^1,,
郑清泉^1,,
王明钊^{1, 2,}

1.
陕西师范大学计算机科学学院西安 710119
2.
陕西师范大学生命科学学院西安 710119

基金项目: 国家自然科学基金(62076159, 12031010, 61673251), 中央高校基本科研业务费(GK202105003)资助

详细信息

作者简介:
谢娟英：陕西师范大学计算机科学学院教授. 主要研究方向为机器学习, 数据挖掘, 生物医学大数据分析. 本文通信作者. E-mail: xiejuany@snnu.edu.cn

吴肇中：陕西师范大学计算机科学学院硕士研究生. 主要研究方向为机器学习, 生物医学数据分析. E-mail: wzz@snnu.edu.cn

郑清泉：陕西师范大学计算机科学学院硕士研究生. 主要研究方向为数据挖掘, 生物医学数据分析. E-mail: zhengqingqsnnu@163.com

王明钊：陕西师范大学生命科学学院博士研究生. 2017 年获得陕西师范大学计算机科学学院硕士学位. 主要研究方向为生物信息学. E-mail: wangmz2017@snnu.edu.cn

计量
- 文章访问数: 1152
- HTML全文浏览量: 525
- PDF下载量: 138
- 被引次数: 0
出版历程
- 收稿日期: 2020-09-01
- 修回日期: 2021-03-02
- 网络出版日期: 2021-04-25
- 刊出日期: 2022-05-13

An Improved Criterion for Evaluating the Discernibility of a Feature Subset

1.
School of Computer Science, Shaanxi Normal University, Xi＇an 710119
2.
College of Life Sciences, Shaanxi Normal University, Xi＇an 710119

Funds: Supported by National Natural Science Foundation of China (62076159, 12031010, 61673251), Fundamental Research Funds for the Central Universities (GK202105003)

More Information

Author Bio:
XIE Juan-Ying　Professor at the School of Computer Science, Shaanxi Normal University. Her research interest covers machine learning, data mining, and biomedical big data analysis. Corresponding author of this paper

WU Zhao-Zhong　Master student at the School of Computer Science, Shaanxi Normal University. His research interest covers machine learning and biomedical data analysis

ZHENG Qing-Quan　Master student at the School of Computer Science, Shaanxi Normal University. His research interest covers data mining and biomedical data analysis

WANG Ming-Zhao　Ph.D. candidate at the College of Life Sciences, Shaanxi Normal University. He received his master degree from the School of Computer Science, Shaanxi Normal University in 2017. His main research interest is bioinformatics

摘要

摘要: 针对特征子集区分度准则(Discernibility of feature subsets, DFS)没有考虑特征测量量纲对特征子集区分能力影响的缺陷, 引入离散系数, 提出GDFS (Generalized discernibility of feature subsets)特征子集区分度准则. 结合顺序前向、顺序后向、顺序前向浮动和顺序后向浮动4种搜索策略, 以极限学习机为分类器, 得到4种混合特征选择算法. UCI数据集与基因数据集的实验测试, 以及与DFS、Relief、DRJMIM、mRMR、LLE Score、AVC、SVM-RFE、VMInaive、AMID、AMID-DWSFS、CFR和FSSC-SD的实验比较和统计重要度检测表明: 提出的GDFS优于DFS, 能选择到分类能力更好的特征子集.
- 特征子集区分度 /
- 特征选择 /
- 离散系数 /
- 极限学习机 /
- 特征搜索策略
Abstract: To overcome the deficiencies of the discernibility of feature subsets (DFS) which cannot take into account the influences from different attribute scales on the discernibility of a feature subset, the generalized DFS, shorted as GDFS, is proposed in this paper by introducing the coefficient of variation. The GDFS is combined with four search strategies, including sequential forward search (SFS), sequential backward search (SBS), sequential forward floating search (SFFS) and sequential backward floating search (SBFS) to develop four hybrid feature selection algorithms. The extreme learning machine (ELM) is adopted as a classification tool to guide feature selection process. We test the classification capability of the feature subsets detected by GDFS on the datasets from UCI machine learning repository and on the classic gene expression datasets, and compare the performance of the ELM classifiers based on the feature subsets by GDFS, DFS and classic feature selection algorithms including Relief, DRJMIM, mRMR, LLE Score, AVC, SVM-RFE, VMInaive, AMID, AMID-DWSFS, CFR, and FSSC-SD respectively. The statistical significance test is also conducted between GDFS, DFS, Relief, DRJMIM, mRMR, LLE Score, AVC, SVM-RFE, VMInaive, AMID, AMID-DWSFS, CFR, and FSSC-SD. Experimental results demonstrate that the proposed GDFS is superior to the original DFS. It can detect the feature subsets with much better capability in classification performance.
- Discernibility of a feature subset /
- feature selection /
- coefficient of variation /
- extreme learning machine /
- feature search strategies

HTML全文

图 1 DFS+SFS算法的5-折交叉验证实验结果

Fig. 1 The 5-fold cross-validation experimental results of DFS+SFS

下载: 全尺寸图片幻灯片

图 4 DFS+SBFS算法的5-折交叉验证实验结果

Fig. 4 The 5-fold cross-validation experimental results of DFS+SBFS

下载: 全尺寸图片幻灯片

图 2 DFS+SBS算法的5-折交叉验证实验结果

Fig. 2 The 5-fold cross-validation experimental results of DFS+SBS

下载: 全尺寸图片幻灯片

图 3 DFS+SFFS算法的5-折交叉验证实验结果

Fig. 3 The 5-fold cross-validation experimental results of DFS+SFFS

下载: 全尺寸图片幻灯片

图 5 各特征选择算法的Nemenyi检验结果

Fig. 5 Nemenyi test results of 13 feature selection algorithms in terms of performance metrics of ELM built on their selected features

下载: 全尺寸图片幻灯片

表 1 实验用UCI数据集描述

Table 1 Descriptions of datasets from UCI

数据集	样本个数	特征数	类别数
iris	150	4	3
thyroid-disease	215	5	3
glass	214	9	2
wine	178	13	3
Heart Disease	297	13	3
WDBC	569	30	2
WPBC	194	33	2
dermatology	358	34	6
ionosphere	351	34	2
Handwrite	323	256	2

下载: 导出CSV

表 2 GDFS+SFS与DFS+SFS算法的5-折交叉验证实验结果

Table 2 The 5-fold cross-validation experimental results of GDFS+SFS and DFS+SFS algorithms

Data sets	#原特征	#选择特征		测试准确率
Data sets	#原特征	GDFS	DFS	GDFS	DFS
iris	4	2.2	3	0.9733	0.9667
thyroid-disease	5	1.4	1.6	0.9163	0.9070
glass	9	2.4	3.2	0.9346	0.9439
wine	13	3.6	3.6	0.9272	0.8925
Heart Disease	13	2.8	3.4	0.5889	0.5654
WDBC	30	3.4	6.2	0.9227	0.9193
WPBC	33	1.8	2	0.7835	0.7732
dermatology	34	4.6	5	0.7151	0.6938
ionosphere	34	4.4	3	0.9029	0.8717
Handwrite	256	7.4	7.2	0.9657	0.9440
平均	43.1	3.4	3.82	0.8630	0.8478

下载: 导出CSV

表 5 GDFS+SBFS与DFS+SBFS算法的5-折交叉验证实验结果

Table 5 The 5-fold cross-validation experimental results of GDFS+SBFS and DFS+SBFS algorithms

Data sets	#原特征	#选择特征		测试准确率
Data sets	#原特征	GDFS	DFS	GDFS	DFS
iris	4	2.4	2.8	0.98	0.9667
thyroid-disease	5	2.4	2.2	0.9395	0.9209
glass	9	5.4	4	0.8979	0.9490
wine	13	9.2	9.4	0.6519	0.6086
Heart Disease	13	5.4	6.4	0.5757	0.5655
WDBC	30	22.8	24.6	0.8911	0.8893
WPBC	33	24.6	25.4	0.7681	0.7319
dermatology	34	28.2	27.2	0.9444	0.9362
ionosphere	34	28.4	26.2	0.9174	0.9087
Handwrite	256	137.4	148	0.9938	0.9722
平均	43.1	26.62	27.62	0.8560	0.8449

下载: 导出CSV

表 3 GDFS+SBS与DFS+SBS算法的5-折交叉验证实验结果

Table 3 The 5-fold cross-validation experimental results of GDFS+SBS and DFS+SBS algorithms

Data sets	#原特征	#选择特征		测试准确率
Data sets	#原特征	GDFS	DFS	GDFS	DFS
iris	4	2.6	3.2	0.9867	0.9733
thyroid-disease	5	2.8	3.2	0.9269	0.9070
glass	9	8.2	6.8	0.9580	0.9375
wine	13	12	11.6	0.6855	0.6515
Heart Disease	13	11.8	11.8	0.5490	0.5419
WDBC	30	28	28.8	0.8981	0.8616
WPBC	33	30.8	31.6	0.7785	0.7633
dermatology	34	31	31	0.9443	0.9303
ionosphere	34	31.8	32.2	0.9031	0.8947
Handwrite	256	245	248.6	1	0.9936
平均	43.1	40.4	40.88	0.8630	0.8455

下载: 导出CSV

表 4 GDFS+SFFS与DFS+SFFS算法的5-折交叉验证实验结果

Table 4 The 5-fold cross-validation experimental results of GDFS+SFFS and DFS+SFFS algorithms

Data sets	#原特征	#选择特征		测试准确率
Data sets	#原特征	GDFS	DFS	GDFS	DFS
iris	4	2.8	3	0.9867	0.9667
thyroid-disease	5	2.2	2.2	0.9395	0.9349
glass	9	4.2	4.4	0.9629	0.9442
wine	13	4.2	4.4	0.9261	0.9041
Heart Disease	13	4.4	4.8	0.5928	0.5757
WDBC	30	11	11.4	0.9385	0.9074
WPBC	33	5.8	4.4	0.7943	0.7886
dermatology	34	16.8	17.4	0.9522	0.9552
ionosphere	34	9.6	10.2	0.9173	0.9231
Handwrite	256	42.2	40.8	0.9907	0.9846
平均	43.1	10.32	10.3	0.8992	0.8885

下载: 导出CSV

表 6 实验使用的基因数据集描述

Table 6 Descriptions of gene datasets using in experiments

数据集	样本数	特征数	类别数
Colon	62	2000	2
Prostate	102	12625	2
Myeloma	173	12625	2
Gas2	124	22283	2
SRBCT	83	2308	4
Carcinoma	174	9182	11

下载: 导出CSV

表 7 各算法在表6基因数据集的5-折交叉验证实验结果

Table 7 The 5-fold cross-validation experimental results of all algorithms on datasets from Table 6

Data sets	算法	特征数	Accuracy	AUC	recall	precision	F-measure	F2-measure
Colon	GDFS+SFFS	5.2	0.7590	0.8925	0.9	0.7	0.78	0.4133
	DFS+SFFS	5.4	0.7256	0.78	0.8250	0.6856	0.7352	0.2332
	Relief	8	0.7231	0.7575	0.9	0.6291	0.7396	0.16
	DRJMIM	13	0.7282	0.7825	0.8750	0.6642	0.7495	0.3250
	mRMR	5	0.7602	0.7325	0.85	0.6281	0.7185	0.1578
	LLE Score	7	0.7577	0.6563	0.8750	0.6537	0.7431	0.2057
	AVC	2	0.7256	0.7297	0.86	0.6439	0.7256	0.2126
	SVM-RFE	5	0.7577	0.7588	0.75	0.6273	0.6775	0.3260
	VMI_naive	2	0.7423	1	1	0.6462	0.7848	0
	AMID	8	0.7436	0.95	0.95	0.6328	0.7581	0
	AMID-DWSFS	2	0.8397	0.9875	0.9750	0.6688	0.7895	0.1436
	CFR	3	0.7603	0.95	1	0.6462	0.7848	0
	FSSC-SD	2	0.7269	0.9750	0.9750	0.6401	0.7721	0
Prostate	GDFS+SFFS	6.4	0.9305	0.9029	0.8836	0.8836	0.8829	0.8818
	DFS+SFFS	6.6	0.9105	0.9349	0.8816	0.8818	0.8529	0.8497
	Relief	11	0.93	0.8525	0.8255	0.7824	0.7981	0.79
	DRJMIM	9	0.94	0.8629	0.7891	0.8747	0.8216	0.83
	mRMR	12	0.9414	0.7895	0.7327	0.7816	0.7520	0.7597
	LLE Score	26	0.9119	0.6796	0.7291	0.6582	0.6847	0.6616
	AVC	12	0.9514	0.8144	0.7655	0.7598	0.7592	0.7573
	SVM-RFE	22	0.92	0.8453	0.6927	0.8474	0.7567	0.7824
	VMI_naive	9	0.9419	0.8605	0.7655	0.7418	0.7481	0.7580
	AMID	27	0.9314	0.7929	0.7655	0.7936	0.7690	0.7797
	AMID-DWSFS	4	0.9514	0.7251	0.7127	0.7171	0.7011	0.7098
	CFR	7	0.9410	0.7840	0.88	0.7430	0.7922	0.7942
	FSSC-SD	23	0.9024	0.7796	0.8018	0.8205	0.7892	0.8130
Myeloma	GDFS+SFFS	9.6	0.7974	0.6805	0.8971	0.8230	0.8558	0.5463
	DFS+SFFS	9.8	0.7744	0.6296	0.8971	0.8047	0.8474	0.3121
	Relief	23	0.8616	0.6453	0.8693	0.8225	0.8415	0.4631
	DRJMIM	36	0.8559	0.6210	0.8392	0.7881	0.8124	0.2682
	mRMR	12	0.8436	0.6332	0.8095	0.8046	0.8067	0.3539
	LLE Score	64	0.8492	0.6169	0.9127	0.7909	0.8461	0.2313
	AVC	22	0.8329	0.5820	0.8974	0.8098	0.8501	0.3809
	SVM-RFE	20	0.8330	0.6270	0.8971	0.7935	0.8416	0.3846
	VMI_naive	19	0.8383	0.5639	0.8847	0.7902	0.8331	0.2691
	AMID	11	0.8325	0.6743	0.8979	0.8282	0.8603	0.5254
	AMID-DWSFS	38	0.8381	0.6233	0.8381	0.8197	0.8249	0.5224
	CFR	14	0.8504	0.5931	0.9124	0.8014	0.8523	0.3010
	FSSC-SD	15	0.8381	0.6662	0.8754	0.8173	0.8438	0.4992
Gas2	GDFS+SFFS	7.4	0.9840	0.9704	0.9051	0.9846	0.9412	0.9474
	DFS+SFFS	8.4	0.9429	0.9465	0.9064	0.9212	0.9203	0.9018
	Relief	4	0.9763	0.9520	0.8577	0.9316	0.8911	0.9005
	DRJMIM	19	0.9750	0.9004	0.8192	0.8848	0.8449	0.8584
	mRMR	5	0.9756	0.9358	0.8551	0.9131	0.8815	0.8895
		LLE Score	25	0.9769	0.9312	0.8659	0.8748	0.8449	0.8538
		AVC	3	0.9840	0.9073	0.8897	0.9390	0.9122	0.9160
		SVM-RFE	18	0.9756	0.9009	0.8205	0.9052	0.8503	0.8716
		VMI_naive	10	0.9763	0.9425	0.7372	0.9778	0.8311	0.8778
		AMID	16	0.9833	0.9305	0.9205	0.8829	0.8968	0.9013
		AMID-DWSFS	2	0.9840	0.9247	0.8359	0.9424	0.8839	0.8977
		CFR	10	0.9917	0.9080	0.9013	0.8236	0.8432	0.8434
		FSSC-SD	16	0.9596	0.9095	0.8538	0.8758	0.8555	0.8642
SRBCT	GDFS+SFFS	11.6	0.9372	0.9749	0.9567	0.9684	0.9579	0.9573
	DFS+SFFS	11.6	0.9034	0.9130	0.9356	0.9449	0.9452	0.9352
	Relief	10	0.9631	0.9479	0.9439	0.9589	0.9467	0.9390
	DRJMIM	4	0.9389	0.9363	0.9656	0.9511	0.9555	0.9503
	mRMR	8	0.9528	0.9479	0.9283	0.9624	0.9275	0.9294
	LLE Score	11	0.9271	0.8941	0.9333	0.9332	0.9247	0.9154
	AVC	8	0.9042	0.9355	0.9139	0.9544	0.9223	0.9183
	SVM-RFE	13	0.8421	0.9149	0.9128	0.9385	0.9159	0.8240
	VMI_naive	14	0.9409	0.9181	0.9250	0.9429	0.9269	0.9188
	AMID	13	0.9387	0.8999	0.9567	0.9335	0.9407	0.9239
	AMID-DWSFS	9	0.9167	0.8151	0.8178	0.8516	0.82	0.7466
	CFR	8	0.9314	0.6839	0.8994	0.8570	0.8693	0.7150
	FSSC-SD	6	0.8806	0.9096	0.9267	0.9422	0.9284	0.9160
Carcinoma	GDFS+SFFS	23.4	0.7622	0.9037	0.7872	0.7879	0.7839	0.5570
	DFS+SFFS	19.4	0.7469	0.8998	0.7808	0.7869	0.7801	0.6261
	Relief	42	0.7351	0.8701	0.7687	0.7785	0.7680	0.5392
	DRJMIM	13	0.7757	0.8991	0.6742	0.6621	0.6656	0.4557
	mRMR	24	0.8079	0.9188	0.7613	0.7505	0.7533	0.5089
	LLE Score	76	0.6682	0.8452	0.6689	0.6702	0.6663	0.4109
	AVC	77	0.7227	0.8746	0.7872	0.7790	0.7796	0.5068
	SVM-RFE	30	0.7213	0.87	0.7027	0.6933	0.6929	0.4065
	VMI_naive	33	0.7443	0.8784	0.7487	0.7527	0.7441	0.4731
	AMID	42	0.7307	0.8878	0.7295	0.7165	0.7194	0.4841
	AMID-DWSFS	38	0.7412	0.6231	0.7558	0.7447	0.7457	0.4255
	CFR	33	0.7054	0.6216	0.7514	0.74	0.7410	0.5315
	FSSC-SD	21	0.7306	0.8716	0.7039	0.7016	0.6992	0.4344

下载: 导出CSV

表 8 各算法所选特征子集分类能力的Friedman检测结果

Table 8 The Friedman＇s test of the classification capability of feature subsets of all algorithms

	Accuracy	AUC	recall	precision	F-measure	F2-measure
${\chi ^2}$	23.4094	27.5527	22.1585	29.2936	26.7608	32.5446
df	12	12	12	12	12	12
p	0.0244	0.0064	0.0358	0.0036	0.0084	0.0011

下载: 导出CSV

参考文献(51)

[1]	陈晓云, 廖梦真. 基于稀疏和近邻保持的极限学习机降维. 自动化学报, 2019, 45(2): 325-333 Chen Xiao-Yun, Liao Meng-Zhen. Dimensionality reduction with extreme learning machine based on sparsity and neighborhood preserving. Acta Automatica Sinica, 2019, 45(2): 325-333
[2]	Xie J Y, Lei J H, Xie W X, Shi Y, Liu X H. Two-stage hybrid feature selection algorithms for diagnosing erythemato-squamous diseases. Health Information Science and Systems, 2013, 1: Article No. 10 doi: 10.1186/2047-2501-1-10
[3]	谢娟英, 周颖. 一种新聚类评价指标. 陕西师范大学学报(自然科学版), 2015, 43(6): 1-8 Xie Juan-Ying, Zhou Ying. A new criterion for clustering algorithm. Journal of Shaanxi Normal University (Natural Science Edition), 2015, 43(6): 1-8
[4]	Kou G, Yang P, Peng Y, Xiao F, Chen Y, Alsaadi F E. Evaluation of feature selection methods for text classification with small datasets using multiple criteria decision-making methods. Applied Soft Computing, 2020, 86: Article No. 105836 doi: 10.1016/j.asoc.2019.105836
[5]	Xue Y, Xue B, Zhang M J. Self-adaptive particle swarm optimization for large-scale feature selection in classification. ACM Transactions on Knowledge Discovery from Data, 2019, 13(5): Article No. 50
[6]	Zhang Y, Gong D W, Gao X Z, Tian T, Sun X Y. Binary differential evolution with self-learning for multi-objective feature selection. Information Sciences, 2020, 507: 67-85. doi: 10.1016/j.ins.2019.08.040
[7]	Nguyen B H, Xue B, Zhang M J. A survey on swarm intelligence approaches to feature selection in data mining. Swarm and Evolutionary Computation, 2020, 54: Article No. 100663 doi: 10.1016/j.swevo.2020.100663
[8]	Solorio-Fernández S, Carrasco-Ochoa J A, Martínez-Trinidad J F. A review of unsupervised feature selection methods. Artificial Intelligence Review, 2020, 53(2): 907-948 doi: 10.1007/s10462-019-09682-y
[9]	Karasu S, Altan A, Bekiros S, Ahmad W. A new forecasting model with wrapper-based feature selection approach using multi-objective optimization technique for chaotic crude oil time series.Energy, 2020, 212: Article No. 118750 doi: 10.1016/j.energy.2020.118750
[10]	Al-Tashi Q, Abdulkadir S J, Rais H, Mirjalili S, Alhussian H. Approaches to multi-objective feature selection: A systematic literature review. IEEE Access, 2020, 8: 125076-125096 doi: 10.1109/ACCESS.2020.3007291
[11]	Deng X L, Li Y Q, Weng J, Zhang J L. Feature selection for text classification: A review. Multimedia Tools and Applications, 2019, 78(3): 3797-3816 doi: 10.1007/s11042-018-6083-5
[12]	贾鹤鸣, 李瑶, 孙康健. 基于遗传乌燕鸥算法的同步优化特征选择. 自动化学报, DOI: 10.16383/j.aas.c200322 Jia He-Ming, Li Yao, Sun Kang-Jian. Simultaneous feature selection optimization based on hybrid sooty tern optimization algorithm and genetic algorithm. Acta Automatica Sinica, DOI: 10.16383/j.aas.c200322
[13]	Xie J Y, Wang C X. Using support vector machines with a novel hybrid feature selection method for diagnosis of erythemato-squamous diseases. Expert Systems With Applications, 2011, 38(5): 5809-5815 doi: 10.1016/j.eswa.2010.10.050
[14]	Bolón-Canedo V, Alonso-Betanzos A. Ensembles for feature selection: A review and future trends. Information Fusion, 2019, 52: 1-12 doi: 10.1016/j.inffus.2018.11.008
[15]	Kira K, Rendell L A. The feature selection problem: Traditional methods and a new algorithm. In: Proceedings of the 10th National Conference on Artificial Intelligence. San Jos, USA: AAAI Press, 1992. 129−134
[16]	Kononenko I. Estimating attributes: Analysis and extensions of RELIEF. In: Proceedings of the 7th European Conference on Machine Learning. Catania, Italy: Springer, 1994. 171−182
[17]	Liu H, Setiono R. Feature selection and classification — a probabilistic wrapper approach. In: Proceedings of the 9th International Conference on Industrial and Engineering Applications of Artificial Intelligence and Expert Systems. Fukuoka, Japan: Gordon and Breach Science Publishers, 1997. 419−424
[18]	Guyon I, Weston J, Barnhill S. Gene selection for cancer classification using support vector machines. Machine Learning, 2002, 46(1-3): 389-422
[19]	Peng H C, Long F H, Ding C. Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2005, 27(8): 1226-1238 doi: 10.1109/TPAMI.2005.159
[20]	Chen Y W, Lin C J. Combining SVMs with various feature selection strategies. Feature Extraction: Foundations and Applications. Berlin, Heidelberg: Springer, 2006. 315−324
[21]	谢娟英, 王春霞, 蒋帅, 张琰. 基于改进的F-score与支持向量机的特征选择方法. 计算机应用, 2010, 30(4): 993-996 doi: 10.3724/SP.J.1087.2010.00993 Xie Juan-Ying, Wang Chun-Xia, Jiang Shuai, Zhang Yan. Feature selection method combing improved F-score and support vector machine. Journal of Computer Applications, 2010, 30(4): 993-996 doi: 10.3724/SP.J.1087.2010.00993
[22]	谢娟英, 雷金虎, 谢维信, 高新波. 基于D-score与支持向量机的混合特征选择方法. 计算机应用, 2011, 31(12): 3292-3296 Xie Juan-Ying, Lei Jin-Hu, Xie Wei-Xin, Gao Xin-Bo. Hybrid feature selection methods based on D-score and support vector machine. Journal of Computer Applications, 2011, 31(12): 3292-3296
[23]	谢娟英, 谢维信. 基于特征子集区分度与支持向量机的特征选择算法. 计算机学报, 2014, 37(8): 1704-1718 Xie Juan-Ying, Xie Wei-Xin. Several feature selection algorithms based on the discernibility of a feature subset and support vector machines. Chinese Journal of Computers, 2014, 37(8): 1704-1718
[24]	李建更, 逄泽楠, 苏磊, 陈思远. 肿瘤基因选择方法LLE Score. 北京工业大学学报, 2015, 41(8): 1145-1150 Li Jian-Geng, Pang Ze-Nan, Su Lei, Chen Si-Yuan. Feature selection method LLE score used for tumor gene expressive data. Journal of Beijing University of Technology, 2015, 41(8): 1145-1150
[25]	Roweis S T, Saul L K. Nonlinear dimensionality reduction by locally linear embedding. Science, 2000, 290(5500): 2323-2326 doi: 10.1126/science.290.5500.2323
[26]	Sun L, Wang J, Wei J M. AVC: Selecting discriminative features on basis of AUC by maximizing variable complementarity. BMC Bioinformatics, 2017, 18(Suppl 3): Article No. 50
[27]	谢娟英, 王明钊, 胡秋锋. 最大化ROC曲线下面积的不平衡基因数据集差异表达基因选择算法. 陕西师范大学学报(自然科学版), 2017, 45(1): 13-22 Xie Juan-Ying, Wang Ming-Zhao, Hu Qiu-Feng. The differentially expressed gene selection algorithms for unbalanced gene datasets by maximize the area under ROC. Journal of Shaanxi Normal University (Natural Science Edition), 2017, 45(1): 13-22
[28]	Hu L, Gao W F, Zhao K, Zhang P, Wang F. Feature selection considering two types of feature relevancy and feature interdependency. Expert Systems With Applications, 2018, 93: 423-434 doi: 10.1016/j.eswa.2017.10.016
[29]	Sun L, Zhang X Y, Qian Y H, Xu J C, Zhang S G. Feature selection using neighborhood entropy-based uncertainty measures for gene expression data classification. Information Sciences, 2019, 502:18-41 doi: 10.1016/j.ins.2019.05.072
[30]	谢娟英, 王明钊, 周颖, 高红超, 许升全. 非平衡基因数据的差异表达基因选择算法研究. 计算机学报, 2019, 42(6): 1232-1251 doi: 10.11897/SP.J.1016.2019.01232 Xie Juan-Ying, Wang Ming-Zhao, Zhou Ying, Gao Hong-Chao, Xu Sheng-Quan. Differential expression gene selection algorithms for unbalanced gene datasets. Chinese Journal of Computers, 2019, 42(6): 1232-1251 doi: 10.11897/SP.J.1016.2019.01232
[31]	Li J D, Cheng K W, Wang S H, Morstatter F, Trevino R P, Tang J L, et al. Feature selection: A data perspective. ACM Computing Surveys, 2018, 50(6): Article No. 94
[32]	刘春英, 贾俊平. 统计学原理. 北京: 中国商务出版社, 2008. Liu Chun-Ying, Jia Jun-Ping. The Principles of Statistics. Beijing: China Commerce and Trade Press, 2008.
[33]	Huang G B, Zhu Q Y, Siew C K. Extreme learning machine: Theory and applications. Neurocomputing, 2006, 70(1-3): 489-501 doi: 10.1016/j.neucom.2005.12.126
[34]	Frank A, Asuncion A. UCI machine learning repository [Online], available: http://archive.ics.uci.edu/ml, October 13, 2020
[35]	Chang C C, Lin C J. LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2011, 2(3): Article No. 27
[36]	Hsu C W, Chang C C, Lin C J. A practical guide to support vector classification [Online], available: https://www.ee.columbia.edu/~sfchang/course/spr/papers/svm-practical-guide.pdf, March 11, 2021
[37]	Alon U, Barkai N, Notterman D A, Gish K, Ybarra S, Mack D, et al. Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proceedings of the National Academy of Sciences of the United States of America, 1999, 96(12): 6745-6750 doi: 10.1073/pnas.96.12.6745
[38]	Singh D, Febbo P G, Ross K, Jackson D G, Manola J, Ladd C, et al. Gene expression correlates of clinical prostate cancer behavior. Cancer Cell, 2002, 1(2): 203-209 doi: 10.1016/S1535-6108(02)00030-2
[39]	Tian E M, Zhan F H, Walker R, Rasmussen E, Ma Y P, Barlogie B, et al. The role of the Wnt-signaling antagonist DKK1 in the development of osteolytic lesions in multiple myeloma. The New England Journal of Medicine, 2003, 349(26): 2483-2494 doi: 10.1056/NEJMoa030847
[40]	Wang G S, Hu N, Yang H H, Wang L M, Su H, Wang C Y, et al. Comparison of global gene expression of gastric cardia and noncardia cancers from a high-risk population in China. PLoS One, 2013, 8(5): Article No. e63826 doi: 10.1371/journal.pone.0063826
[41]	Li W Q, Hu N, Burton V H, Yang H H, Su H, Conway C M, et al. PLCE1 mRNA and protein expression and survival of patients with esophageal squamous cell carcinoma and gastric adenocarcinoma. Cancer Epidemiology, Biomarkers & Prevention, 2014, 23(8): 1579-1588
[42]	Khan J, Wei J S, Ringnér M, Saal L H, Ladanyi M, Westermann F, et al. Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks. Nature Medicine, 2001, 7(6): 673-679 doi: 10.1038/89044
[43]	Gao S Y, Steeg G V, Galstyan A. Variational information maximization for feature selection. In: Proceedings of the 30th International Conference on Neural Information Processing Systems. Barcelona, Spain: Curran Associates, 2016. 487−495
[44]	Gao W F, Hu L, Zhang P, He J L. Feature selection considering the composition of feature relevancy. Pattern Recognition Letters, 2018, 112: 70-74 doi: 10.1016/j.patrec.2018.06.005
[45]	谢娟英, 丁丽娟, 王明钊. 基于谱聚类的无监督特征选择算法. 软件学报, 2020, 31(4): 1009-1024 Xie Juan-Ying, Ding Li-Juan, Wang Ming-Zhao. Spectral clustering based unsupervised feature selection algorithms. Journal of Software, 2020, 31(4): 1009-1024
[46]	Muschelli III J. ROC and AUC with a binary predictor: A potentially misleading metric. Journal of Classification, 2020, 37(3): 696-708 doi: 10.1007/s00357-019-09345-1
[47]	Fawcett T. An introduction to ROC analysis. Pattern Recognition Letters, 2006, 27(8): 861-874 doi: 10.1016/j.patrec.2005.10.010
[48]	Bowers A J, Zhou X L. Receiver operating characteristic (ROC) area under the curve (AUC): A diagnostic measure for evaluating the accuracy of predictors of education outcomes. Journal of Education for Students Placed at Risk (JESPAR), 2019, 24(1): 20-46 doi: 10.1080/10824669.2018.1523734
[49]	卢绍文, 温乙鑫. 基于图像与电流特征的电熔镁炉欠烧工况半监督分类方法. 自动化学报, 2021, 47(4): 891-902 Lu Shso-Wen, Wen Yi-Xin. Semi-supervised classification of semi-molten working condition of fused magnesium furnace based on image and current features. Acta Automatica Sinica, 2021, 47(4): 891-902
[50]	Xie J Y, Gao H C, Xie W X, Liu X H, Grant P W. Robust clustering by detecting density peaks and assigning points based on fuzzy weighted K-nearest neighbors. Information Sciences, 2016, 354: 19-40 doi: 10.1016/j.ins.2016.03.011
[51]	谢娟英, 吴肇中, 郑清泉. 基于信息增益与皮尔森相关系数的2D自适应特征选择算法. 陕西师范大学学报(自然科学版), 2020, 48(6): 69-81 Xie Juan-Ying, Wu Zhao-Zhong, Zheng Qing-Quan. An adaptive 2D feature selection algorithm based on information gain and pearson correlation coefficient. Shaanxi Normal University (Natural Science Edition), 2020, 48(6): 69-81