With the availability of multi-object spectrometers and the design and operation of some large scale sky surveys, the issue of how to deal with enormous quantities of spectral data efficiently and accurately is becoming more and more important. This work investigates the classification problem of stellar spectra under the assumption that there is no perfect absolute flux calibration, for example, when considering spectra from the Guo Shou Jing Telescope(the Large Sky Area Multi-Object Fiber Spectroscopic Telescope, LAMOST). The proposed scheme consists of the following two procedures: Firstly, a spectrum is normalized based on a 17 th order polynomial fitting;secondly, a random forest(RF) is utilized to classify the stellar spectra. Experiments on four stellar spectral libraries show that the RF has good classification performance. This work also studied the spectral feature evaluation problem based on RF. The evaluation is helpful in understanding the results of the proposed stellar classification scheme and exploring its potential improvements in the future.
流量标准化是光谱数据挖掘中的一个基本环节,他对挖掘结果的精度和系统的效率均有重要影响,常用方法存在效率较低的问题,为此研究了光谱数据挖掘中流量标准化的算法设计和效率比较问题。首先,探讨了光谱流量标准化技术不同实现方案的渐进效率,给出了实现高效计算的算法,并分析了它们的时间复杂度和空间复杂度。然后,通过SDSS(sloan digital sky survey)的实测光谱数据,横向比较了不同流量标准化算法的效率差异。在光谱流量标准化算法的纵向理论研究中,主要考虑的是计算效率随数据规模增长的变化规律,是在极限意义下进行探讨。在横向实验比较中,考虑重点是不同算法中基本操作时间复杂度的差异及其对算法效率的影响。理论研究和实验结果表明,虽然四种标准化方法Smax,Smedian,Smean和Sunit的渐进效率的类型相同,但对常见的观测规模光谱数据来说,Smax和Smean的效率远远高于Sunit和Smedian,且常用的Sunit标准化方法效率最低。该研究对于在光谱数据挖掘和开发中,如何根据数据的规模,具体需求,从整体上考虑精度和效率的折衷,以确定合适的流量标准化方法有重要的参考价值。
Large-scale sky surveys are observing massive amounts of stellar spectra. The large number of stellar spectra makes it necessary to automatically parameterize spectral data, which in turn helps in statistically exploring properties related to the atmospheric parameters. This work focuses on designing an automatic scheme to estimate effective temperature (Tee), surface gravity (log g) and metallicity [Fe/H] from stellar spectra. A scheme based on three deep neural networks (DNNs) is proposed. This scheme consists of the following three procedures: first, the configuration of a DNN is initialized using a series of autoencoder neural networks; second, the DNN is fine-tuned using a gradient descent scheme; third, three atmospheric parameters Tefr, log 9 and [Fe/H] are estimated using the computed DNNs. The constructed DNN is a neural network with six layers (one input layer, one output layer and four hidden layers), for which the number of nodes in the six layers are 3821, 1000, 500, 100, 30 and 1, respectively. This proposed scheme was tested on both real spectra and theoretical spectra from Kurucz's new opacity distribution function models. Test errors are measured with mean absolute errors (MAEs). The errors on real spectra from the Sloan Digital Sky Survey (SDSS) are 0.1477, 0.0048 and 0.1129 dex for log 9, log Tefr and [Fe/H] (64.85 K for Teff), respectively. Regarding theoretical spectra from Kurucz's new opacity distribution function models, the MAE of the test errors are 0.0182, 0.0011 and 0.0112 dex for log 9, log Teff and [Fe/H] (14.90 K for Tdf), respectively.