Machine learning guided prediction of liquid chromatography-mass spectrometry ionization efficiency for genotoxic impurities in pharmaceutical products.

Affiliation

Miyamoto K(1), Mizuno H(2), Sugiyama E(2), Toyo'oka T(2), Todoroki K(3).
Author information:
(1)Analytical Research Laboratories, Astellas Pharma Inc., 180 Ozumi, Yaizu, Shizuoka 425-0072, Japan; Department of Analytical and Bioanalytical Chemistry, School of Pharmaceutical Sciences, University of Shizuoka, Shizuoka, Japan. Electronic address: [Email]
(2)Department of Analytical and Bioanalytical Chemistry, School of Pharmaceutical Sciences, University of Shizuoka, Shizuoka, Japan.
(3)Department of Analytical and Bioanalytical Chemistry, School of Pharmaceutical Sciences, University of Shizuoka, Shizuoka, Japan. Electronic address: [Email]

Abstract

The limitation and control of genotoxic impurities (GTIs) has continued to receive attention from pharmaceutical companies and authorities for several decades. Because GTIs have the ability to damage deoxyribonucleic acid (DNA) and the potential to cause cancer, low-level quantitation is required to protect patients. A quick and easy method of determining the liquid chromatography-mass spectrometry (LC/MS) conditions for high-sensitivity analysis of GTIs may prospectively accelerate pharmaceutical development. In this study, a quantitative structure-property relationship (QSPR) model was developed for predicting the ionization efficiency of compounds using liquid-chromatography-mass spectrometry (LC/MS) parameters and molecular descriptors. Before implementing the QSPR prediction model, linear regression analysis was performed to model the relationship between the ionization efficiency and the LC/MS parameters for each compound. Comparison of the predicted peak areas with the experimentally observed peak areas showed good agreement based on the coefficient of determination (R2 > 0.96). The machine learning-based QSPR approach begins with computation of the molecular descriptors expressing the physicochemical properties of a compound, followed by a genetic algorithm-based feature selection. Linear and nonlinear regression were performed, and support vector machine (SVM) was selected as the best machine learning algorithm for the prediction. The SVM algorithm was developed and optimized using 1031 experimental data points for nine compounds, including well-known GTIs. Validation of the model by comparison of the predicted and observed relative ionization efficiencies (RIE) showed a high coefficient of determination (R2 = 0.96) and low root mean squared error value (RMSE = 0.118). Finally, this established prediction model was applied to hydrophilic interaction liquid chromatography coupled with MS for a new compound in new mobile phase compositions and new MS parameter settings. The RMSE of the predicted versus observed RIE was 0.203. This prediction accuracy was sufficient to determine the starting point of the LC/MS method development. The methodology demonstrated in this study can be used to determine the LC/MS conditions for high sensitivity analysis of GTIs.