Binary genetic algorithm for optimal joinpoint detection: Application to cancer trend analysis.


Kim S(1), Lee S(2), Choi JI(1), Cho H(2).
Author information:
(1)School of Mathematics and Computing
(Computational Science and Engineering), Yonsei University, Seoul, Korea.
(2)Department of Cancer Control and Population Health, Graduate School of Cancer Science and Policy, National Cancer Center, Goyang, Korea.


The joinpoint regression model (JRM) is used to describe trend changes in many applications and relies on the detection of joinpoints (changepoints). However, the existing joinpoint detection methods, namely, the grid search (GS)-based methods, are computationally demanding, and hence, the maximum number of computable joinpoints is limited. Herein, we developed a genetic algorithm-based joinpoint (GAJP) model in which an explicitly decoupled computing procedure for optimization and regression is used to embed a binary genetic algorithm into the JRM for optimal joinpoint detection. The combinations of joinpoints were represented as binary chromosomes, and genetic operations were performed to determine the optimum solution by minimizing the fitness function, the Bayesian information criterion (BIC) and BIC3 . The accuracy and computational performance of the GAJP model were evaluated via intensive simulation studies and compared with those of the GS-based methods using BIC, BIC3 , and permutation test. The proposed method showed an outstanding computational efficiency in detecting multiple joinpoints. Finally, the suitability of the GAJP model for the analysis of cancer incidence trends was demonstrated by applying this model to data on the incidence of colorectal cancer in the United States from 1975 to 2016 from the National Cancer Institute's Surveillance, Epidemiology, and End Results program. Thus, the GAJP model was concluded to be practically feasible to detect multiple joinpoints up to the number of grids without requirement to preassign the number of joinpoints and be easily extendable to cancer trend analysis utilizing large datasets.