Study and Development of a Fast and Automatic Astronomical-transient-identification System
-
摘要: 大视场和高时间采样率是现代天文光学瞬变源巡天项目的两个主要发展方向,相对传统的巡天项目将会产生更大的数据量和要求更快的瞬变源识别处理速度.为满足新技术下的瞬变源识别处理要求,本文提出用基于等光度测量星像轮廓等13个新的特征参量取代原有的轮廓拟合参量;使用实际星像轮廓仿真和构建较真实的训练样本算法;加入基于实测数据分析的噪声过滤判据等方法.实现了基于随机林森算法的天文光学瞬变源自动快速识别系统.通过仿真和实测数据的测试表明:本识别系统较国际主流的同类识别算法提速约10倍,样本识别的总体正确检出率和错误检出率都基本相同,而在低信噪比处,本文的识别算法有较良好的表现.本识别系统已成功应用于我国的迷你地基广角相机阵(地基广角相机阵的先导项目),同时,本系统对于其他天文光学瞬变源巡天项目也有着重要的应用价值.Abstract: With the development of observational technology, modern transient survey projects are required to select the transient candidates fast and automatically from large volume data with noise. We present a fast and automatic identification system to search transients by the following methods:introducing 13 new features to measure objects' profiles by isophotometry in the place of PSF fit, using high simulation data based on real objects' profiles as training sample, and designing a special noise filter function. The identification system is realized by supervised machine learning technique of random forest. Our test demonstrates that the processing speed is 10 times faster than the popular identification system in the world, while their true and false positive rates are at the same level. Additionally, our system shows good performance for low signal-to-noise-ratio data due to its isophotometry's features. Our system has been successfully operating in the Mini-GWAC (Miniature ground wide angle camera) online data processing pipeline.
-
Key words:
- Machine learning /
- random forest /
- robotic identification of transient /
- profile of star /
- isophotometry
1) 本文责任编委 胡清华 -
表 1 特征参量
Table 1 Feature sets
组号 序号 特征参量 参量描述 权重 排序 来源 Ⅰ 1 flux_radius2 20%能量处的像斑孔径大小(单位:像素) 0.1391 1 本文新参量 2 flux_radius1 10%能量处的像斑孔径大小(单位:像素) 0.0548 6 3 flux_aper 固定孔径($r$ = 2.5像元)的流量 0.0287 13 4 ISO 0 等光度区域0的面积(单位:像素平方) 0.0559 5 5 ISO 1 等光度区域1的面积(单位:像素平方) 0.0308 11 6 ISO 2 等光度区域2的面积(单位:像素平方) 0.0145 20 7 ISO 3 等光度区域3的面积(单位:像素平方) 0.0145 19 8 ISO 4 等光度区域4的面积(单位:像素平方) 0.0099 23 9 r_max_aper 最大像元光度流量与固定孔径流量之比 0.1056 3 10 r_aper_ISO 固定孔径流量与等光度流量之比 0.0295 12 11 r_aper_ISOCOR 固定孔径流量与修正等光度流量之比 0.0213 16 12 mag_err_aper 星等的均方根误差 0.0349 10 13 class_star 恒星与星系分类标识(取值: 0~1) 0.0072 25 Ⅱ 14 diffsum 在矩阵$R(d)$上, 以对象为中心构成的5×5矩阵中所有元素的和 0.0512 7 文献[8] 15 colmeds 在矩阵$B(d)$上, 每列元素中位数的最大值 0.0152 18 16 numneg 在矩阵$R(d)$上, 以对象为中心构成的7×7矩阵中负元素的个数 0.0086 24 17 a_image 长轴方向上的均方根, 来自SExtractor 0.0138 21 18 b_image 短轴方向上的均方根, 来自SExtractor 0.1152 2 19 ellipticity 1-b_image/a_image, 来自SExtractor 0.0908 4 20 flags SExtractor在矩阵$I(d)$上的提取标志, 来自SExtractor 0.0404 8 21 mag_aper 固定孔径的星等, 来自于SExtractor 0.0361 9 22 n2sig3 在矩阵$R(d)$上, 以对象为中心构成的5×5矩阵中元素值<-2的个数 0.0273 14 23 n3sig3 在矩阵$R(d)$上, 以对象为中心构成的5×5矩阵中元素值<-3的个数 0.0228 15 24 n3sig5 在矩阵$R(d)$上, 以对象为中心构成的7×7矩阵中元素值<-3的个数 0.0188 17 25 n2sig5 在矩阵$R(d)$上, 以对象为中心构成的7×7矩阵中元素值<-2的个数 0.0135 22 Ⅲ r_aper_psf (flux_aper+flux_psf)/flux_psf 0.148 文献[8] flux_ratio 矩阵$I(d)$上以对象为中心的5个像素上的流量值与矩阵$I(t)$上以对象为中心的5个像素上流量值的绝对值之比 0.037 n3sig3shift 矩阵$R(d)$上以对象为中心构成的5×5矩阵中元素≥ 3的个数与矩阵$R(t)$上以对象为中心构成的5×5矩阵中元素大于等于3的个数之差 0.019 n3sig5shift 矩阵$R(d)$上以对象为中心构成的7×7矩阵中元素≥ 3的个数与矩阵$R(t)$上以对象为中心构成的7×7矩阵中元素大于等于3的个数之差 0.018 n2sig3shift 矩阵$R(d)$上以对象为中心构成的5×5矩阵中元素≥ 2的个数与矩阵$R(t)$上以对象为中心构成的5×5矩阵中元素大于等于2的个数之差 0.014 n2sig5shift 矩阵$R(d)$上以对象为中心构成的7×7矩阵中元素≥ 2的个数与矩阵$R(t)$上以对象为中心构成的7×7矩阵中元素大于等于2的个数之差 0.012 表 2 随机森林主要参数
Table 2 The main parameters of random forest
超参数名称 取值 描述 n_estimators 100 随机森林中树的个数 criterion entropy 决定树中节点是否进行分割的决策函数 n_jobs -1 随机森林中并行训练树的个数, -1表示并行训练树的个数等于计算机CPU的核数 max_features 5 训练节点时无放回随机抽取的最大特征维数 min_samples_split 3 训练分割节点时需要的最少样本数 max_depth unlimited 随机森林中树的最大深度 -
[1] Perlmutter S, Aldering G, Goldhaber G, Knop R A, Nugent P, Castro P G, Deustua S, Fabbro S, Goobar A, Groom D E. Measurements Ω of Λ and from 42 high-redshift supernovae. The Astronomical Journal, 1999, 517(2):565-586 https://www.physics.rutgers.edu/grad/690/Mar13-Hovey.pdf [2] Riess A G, Filippenko A V, Challis P, Clocchiatti A, Diercks A, Garnavich P M, Gilliland R L, Hogan C J, Jha S, Kirshner R P, Leibundgut B, Phillips M M, Reiss D, Schmidt B P, Schommer R A, Smith R C, Spyromilio J, Stubbs C, Suntzeff N B, Tonry J. Observational evidence from supernovae for an accelerating universe and a cosmological constant. The Astronomical Journal, 1998, 116(3):1009-1038 doi: 10.1086/300499 [3] 吴潮, 张天萌, 王晓峰, 裘予雷.超新星宇宙学的观测与研究进展.天文学进展, 2013, 31(1):37-55 http://www.doc88.com/p-9179977318136.htmlWu Chao, Zhang Tian-Meng, Wang Xiao-Feng, Qiu Yu-Lei. Supernova cosmology:observations and progress. Progress in Astronomy, 2013, 31(1):37-55 http://www.doc88.com/p-9179977318136.html [4] Bailey S, Aragon C, Romano R, Thomas R C, Weaver B A, Wong D. How to find more supernovae with less work:object classification techniques for difference imaging. The Astronomical Journal, 2007, 665(2):1246-1253 https://arxiv.org/abs/0705.0493 [5] Brink H, Richards J W, Poznanski D, Bloom J S, Rice J, Negahban S, Wainwright M. Using machine learning for discovery in synoptic survey imaging data. Monthly Notices of the Royal Astronomical Society, 2013, 435(2):1047-1060 doi: 10.1093/mnras/stt1306 [6] Bloom J S, Richards J W, Nugent P E, Quimby R M, Kasliwal M M, Starr D L, Poznanski D, Ofek E O, Cenko S B, Butler N R, Kulkarni S R, Gal-Yam A, Law N. Automating discovery and classification of transients and variable stars in the synoptic survey era. Publications of the Astronomical Society of the Pacific, 2012, 124(921):1175-1196 doi: 10.1086/668468 [7] Buisson du L, Sivanandam N, Bassett B A, Smith M. Machine learning classification of SDSS transient survey images. Monthly Notices of the Royal Astronomical Society, 2015, 454(2):2026-2038 doi: 10.1093/mnras/stv2041 [8] Goldstein D A, D'Andrea C B, Fischer J A, Foley R J, Gupta R R, Kessler R, Kim A G, Nichol R C, Nugent P E, Papadopoulos A, Sako M, Smith M, Sullivan M, Thomas R C, Wester W, Wolf R C, Abdalla F B, Banerji M, Benoit-Lévy A, Bertin E, Brooks D, Rosell A C, Castander F J, Costa L N D, Covarrubias R, DePoy D L, Desai S, Diehl H T, Doel P, Eifler T F, Neto A F, Finley D A, Flaugher B, Fosalba P, Frieman J, Gerdes D, Gruen D, Gruendl R A, James D, Kuehn K, Kuropatkin N, Lahav O, Li T S, Maia M A G, Makler M, March M, Marshall J L, Martini P, Merritt K W, Miquel R, Nord B, Ogando R, Plazas A A, Romer A K, Roodman A, Sanchez E, Scarpine V, Schubnell M, Sevilla-Noarbe I, Smith R C, Soares-Santos M, Sobreira F, Suchyta E, Swanson M E C, Tarle G, Thaler J, Walker A R. Automated transient identification in the dark energy survey. The Astronomical Journal, 2015, 150(3):Article No. 82 http://www.oalib.com/paper/3558300 [9] Bertin E, Arnouts S. SExtractor:software for source extraction. Astronomy and Astrophysics Supplement Series, 1996, 117:393-404 doi: 10.1051/aas:1996164 [10] Breiman L, Forests R. Machine Learning. Netherlands:Kluwer Academic Publishers, 2001, 45:5-32 [11] 方匡南, 吴见彬, 朱建平, 谢邦昌.随机森林方法研究综述.统计与信息论坛, 2011, 26(3):32-38 http://dspace.xmu.edu.cn/handle/2288/112057?show=fullFang Kuang-Nan, Wu Jian-Bin, Zhu Jian-Ping, Xie Bang-Chang. A review of technologies on random forests. Statistics and Information Forum, 2011, 26(3):32-38 http://dspace.xmu.edu.cn/handle/2288/112057?show=full [12] 黄衍, 查伟雄.随机森林与支持向量机分类性能比较.软件, 2012, 33(6):107-110 http://www.docin.com/p-497267267.htmlHuang Yan, Zha Wei-Xiong. Comparison on classification performance between random forests and support vector machine. Software, 2012, 33(6):107-110 http://www.docin.com/p-497267267.html