Early diagnosis of thyroid cancer diseases using computational intelligence techniques: A case study of a Saudi Arabian dataset.

Affiliation

Olatunji SO(1), Alotaibi S(2), Almutairi E(1), Alrabae Z(1), Almajid Y(1), Altabee R(1), Altassan M(1), Basheer Ahmed MI(1), Farooqui M(1), Alhiyafi J(1).
Author information:
(1)Department of Computer Science, College of Computer Science and Information Technology, Imam Abdulrahman Bin Faisal University, P.O. Box 1982, Dammam, 31441, Saudi Arabia.
(2)Department of Computer Science, College of Computer Science and Information Technology, Imam Abdulrahman Bin Faisal University, P.O. Box 1982, Dammam, 31441, Saudi Arabia. Electronic address: [Email]

Abstract

In recent times, researchers have noticed that chronic diseases have become more common. In the Kingdom of Saudi Arabia, the number of patients with thyroid cancer (TC) has become a concern, necessitating a proactive system that can help cut down the incidence of this disease, where the system can assist in early interventions to prevent or cure the disease. In this paper, we introduce our work developing machine learning-based tools that can serve as early warning systems by detecting TC at very early stages (pre-symptomatic stage). In addition, we aimed at obtaining the greatest possible accuracy while using fewer features. It must be noted that while there have been past efforts to use machine learning in predicting TC, this is the first attempt using a Saudi Arabian dataset as well as targeting diagnosis in the pre-symptomatic stage (pre-emptive diagnosis). The techniques used in this work include random forest (RF), artificial neural network (ANN), support vector machine (SVM), and naïve Bayes (NB), each of which was selected for their unique capabilities. The highest accuracy rate obtained was 90.91% with the RF technique, while SVM, ANN, and NB achieved 84.09%, 88.64%, and 81.82% accuracy, respectively. These levels were obtained by using only seven features out of an available 15. Considering the pattern of the obtained results, it is clear that the RF technique is better and, hence, recommended for this specific problem.