期刊信息

  • 刊名: 河北师范大学学报(自然科学版)Journal of Hebei Normal University (Natural Science)
  • 主办: 河北师范大学
  • ISSN: 1000-5854
  • CN: 13-1061/N
  • 中国科技核心期刊
  • 中国期刊方阵入选期刊
  • 中国高校优秀科技期刊
  • 华北优秀期刊
  • 河北省优秀科技期刊

半监督 Relief-F 特征选择算法

  • (山西大学 计算机科学与技术学院,山西 太原 030006)
  • DOI: 10.13763/j.cnki.jhebnu.nse.202301013

Semi-supervised Relief-F Feature Selection Algorithm

摘要/Abstract

摘要:

数据规模的不断增加,使得为数据库中全部样本做标记变得尤为困难,数据集也因此呈现出了明显的弱标记性.为此,针对大规模少数标记数据集的特征选择问题,基于经典的Relief-F 算法,通过综合考虑有标记样本与无标记样本对数据样本近邻的影响,重新定义样本近邻的搜索策略,提出了一种面向符号数据的半监督特征选择算法.为进一步分析新算法的有效性,仿真实 验中选取了5组UCI数据集,并引入机器学习中3个常用分类器对新算法和对比算法的特征选择结果的分类性能作了分析和比较,实验结果很好地验证了本文中提出的新算法的有效性和可行性.

Abstract:

With the increase of data size,it is very difficult to determine labels for all objects in databases.Data sets present weak markedness as well.Hence,for feature selection on partial labeled data sets,by reference to classical Relief-F algorithm,an effective semi-supervised feature selection algorithm is proposed to deal with partial labeled data.In this algorithm,on the basis of considering labeled samples and unlabeled samples,a kind of new search strategy for finding nearest neighbors is introduced.For analyzing effectiveness of the new algorithm,five UCI data sets and three commonly used classifiers are employed to illustrate classification performance of the new proposed algorithm.The comparison and analysis results show that the new semi-supervised feature selection algorithm is effective and feasible.

参考文献 21

  • [1] 徐宝文,张卫丰.数据挖掘技术在Web预取中的应用研究[J].计算机学报,2001(4):430-436.doi:10.3321/j.issn:0254-4164.2001.04.015 XU Baowen,ZHANG Weifeng.Applying Data Mining to Web Pre-Fetching[J].Chinese Journal of Computers,2001(4):430-443.
  • [2] 岳文琦,张楠,童向荣,等.混合决策信息系统的模糊效用三支决策模型[J].郑州大学学报(理学版),2020,52(1):24-32.doi:10.13705/j.issn.1671-6841.2019130 YUE Wenqi,ZHANG Nan,TONG Xiangrong,et al.Fuzzy Utility Three-way Decisions Model in Hybrid Decision Information Systems[J].Journal of Zhengzhou University(Natural Science),2020,52(1):24-32.
  • [3] 解滨,董新玉,梁皓伟.基于三支动态阈值K-means聚类的入侵检测算法[J].郑州大学学报(理学版),2020,52(2):64-70.doi:10.13705/j.issn.1671-6841.2019233 XIE Bin,DONG Xinyu,LIANG Haowei.An Algorithm of Intrusion Detection Based on Three-way Dynamic Threshold K-means Clustering[J].Journal of Zhengzhou University (Natural Science),2020,52(2):64-70.
  • [4] DASH M,CHOI K,SCHEUERMANN P,et al.Feature Selection for Clustering-a Filter Solution[C]∥IEEE International Conference on Data Mining,2002.doi:10.1109/ICDM.2002.1183893
  • [5] KOHAVI R,JOHN G H.Wrappers for Feature Subset Selection[J].Artificial Intelligence,1997,97(1):273-324.doi:10.1016/S0004-3702(97)00043-X
  • [6] WANG C,HU Q,WANG X,et al.Feature Selection Based on Neighborhood Discrimination Index.[J].IEEE Trans NeuralNetw Learning Syst,2018,29(7):2986-2999.doi:10.1109/TNNLS.2017.2710422
  • [7] LAPORTE L,FLAMARY R,CANU S,et al.Nonconvex Regularizations for Feature Selection in Ranking with Sparse SVM[J].IEEE Trans NeuralNetw Learning Syst,2014,25(6):1118-1130.doi:10.1109/TNNLS.2013.2286696
  • [8] LIU B,FANG B,LIU X,et al.Large Margin Subspace Learning for Feature Selection[J].Pattern Recognition,2013,46(10):2798-2806.doi:10.1016/j.patcog.2013.02.012
  • [9] AMIR H,ERIK C.Semi-supervised Learning for Big Social Data Analysis[J].Neurocomputing,2018,275:1662-1673.doi:10.1016/j.neucom.2017.10.010
  • [10] FORESTIER G,WEMMERT C.Semi-supervised Learning Using Multipleclusterings with Limited Labeled Data[J].Information Sciences,2016(361/362):48-65.
  • [11] 陈潇,李逸薇,刘欢,等.基于网络表示的半监督问答文本情感分类方法[J].郑州大学学报(理学版),2020,52(2):52-58.doi:10.13705/j.issn.1671-6841.2019079 CHEN Xiao,LEE Sophia,LIU Huan,et al.A Semi-supervised Sentiment Classification Method Towards Question-answering Text Based on Network Representation[J].Journal of Zhengzhou University(Natural Science),2020,52(2):52-58.
  • [12] 刘杰,刘欢,李寿山,等.基于双语对抗学习的半监督情感分类[J].郑州大学学报(理学版),2020,52(2):59-63.doi:10.13705/j.issn.1671-6841.2019262 LIU Jie,LIU Huan,LI Shoushan,et al.Semi-supervised Sentiment Classification with Bilingual Adversarial Learning[J].Journal of Zhengzhou University(Natural Science),2020,52(2):59-63.
  • [13] WU X,CHEN H,LI T,et al.Semi-supervised Feature Selection with Minimal Redundancy Based on Local Adaptive[J].Applied Intelligence,2016,173(1):102- 109.doi:10.1007/s10489-021-02288-4
  • [14] 陈红,郭躬德.基于空间覆盖的半监督特征选择方法[J].计算机工程与应用,2010,46(8):130-132.doi:10.3778/j.issn.1002-8331.2010.08.037 CHEN Hong,GUO Gongde.Spatial Overlapping Based Semi-supervised Feature Selection[J].Computer Engineering and Applications,2010,46(8):130-132.
  • [15] 王锋,刘吉超,魏巍.基于信息熵的半监督特征选择算法[J].计算机科学,2018,45(S2):427-430.doi:10.11896/j.issn.1002-137X.2018.11A.088 WANG Feng,LIU Jichao,WEI Wei.Semi-supervised Feature Selection Algorithm Based on Information Entropy[J].Computer Science,2018,45(S2):427-430.
  • [16] 王博,贾焰,田李.基于类标号扩展的半监督特征选择算法[J].计算机科学,2009,36(10):189-191.doi:10.3969/j.issn.1002-137X.2009.10.047 WANG Bo,JIA Yan,TIAN Li.Semi-supervised Feature Selection Algorithm Based on Extension of Label[J].Computer Science,2009,36(10):189-191.
  • [17] LIU K,YANG X,YU H,et al.Rough Set Based Semi-supervised Feature Selection via Ensemble Selector[J].Knowledge-based Systems,2019,165(1):282-296.doi:10.1016/j.knosys.2018.11.034
  • [18] JIANHUA D,QINGHUA H,JINGHONG Z,et al.Attribute Selection for Partially Labeled Categorical Data By Rough Set Approach.[J].IEEE Transactions on Cybernetics,2017,47(9):2460-2471.doi:10.1109/TCYB.2016.2636339
  • [19] URBANOWICZ R L,MEEKER M, CAVA W,et al.Relief-based Feature Selection:Introduction and Review[J].Journal of Biomedical Informatics,2018,85:189- 203.doi:10.1016/j.jbi.2018.07.014
  • [20] 刘吉超,王锋.基于Relief-F的半监督特征选择算法[J].郑州大学学报(理学版),2021,53(1):42-46.doi:10.13705/j.issn.1671-6841.2020196 LIU Jichao,WANG Feng.A Semi-supervised Feature Selection Algorithm Based on Relief-F[J].Journal of Zhengzhou University(Natural Science),2021,53(1):42-46.
  • [21] 梁吉业,白亮,曹付元.基于新的距离度量的K-modes聚类算法[J].计算机研究与发展,2010,47(10):1749-1755. LIANG Jiye,BAI Liang,CAO Fuyuan.K-modes Clustering Algorithm Based on a New Distance Measure[J].Journal of Computer Research and Development,2010,47(10):1749-1755.