Terahertz spectroscopy combined with data dimensionality reduction algorithms for quantitative analysis of protein content in soybeans.

Affiliation

Wei X(1), Li S(2), Zhu S(3), Zheng W(4), Xie Y(4), Zhou S(2), Hu M(2), Miao Y(2), Ma L(2), Wu W(5), Xie Z(5).
Author information:
(1)College of Engineering and Technology, Southwest University, Chongqing 400716, China. Electronic address: [Email]
(2)College of Engineering and Technology, Southwest University, Chongqing 400716, China.
(3)College of Engineering and Technology, Southwest University, Chongqing 400716, China. Electronic address: [Email]
(4)College of Food Science, Southwest University, Chongqing 400716, China.
(5)China Tianjin Grain and Oil Wholesale Trade Market, Tianjin 300171, China.

Abstract

Protein content in soybean is a key determinant of its nutritional and economic value. The paper investigated the feasibility of terahertz (THz) spectroscopy and dimensionality reduction algorithms for the determination of protein content in soybean. First of all, the THz sample spectrum was data processed by pre-processing or dimensionality reduction algorithms. Secondly, by calibration set, using partial least squares regression (PLSR), genetic algorithms-support vector regression (GA-SVR), grey wolf optimizer-support vector regression (GWO-SVR) and back propagation neural network (BPNN) were respectively used to model protein content determination. Afterwards, the model was validated by the prediction set. Ultimately, the BPNN model combined with linear discriminant analysis (LDA) for related coefficient of prediction set (Rp), root mean square error of prediction set (RMSEP), relative standard deviation (RSD), the time required for the operation was respectively 0.9677, 1.2467%, 3.3664%, and 53.51 s. The experimental results showed that the rapid and accurate quantitative determination of protein in soybean using THz spectroscopy is feasible after a suitable dimensionality reduction algorithm.