Evolutionary Algorithm based Ensemble Extractive Summarization for Developing Smart Medical System.


Mallick C(1), Das AK(2), Nayak J(3), Pelusi D(4), Vimal S(5).
Author information:
(1)Department of Computer Science and Technology, Indian Institute of Engineering Science and Technology, Shibpur, Howrah, 711103, India.
(2)Department of Computer Science and Technology, Indian Institute of Engineering Science and Technology, Shibpur, Howrah, 711103, India. [Email]
(3)Department of Computer Science and Engineering, Aditya Institute of Technology and Management
(AITAM), Tekkali, Andhra Pradesh, 532201, India.
(4)Department of Communications Sciences, University of Teramo, Teramo, Italy.
(5)Department of Information Technology, National Engineering College, K.R.Nagar, Kovilpatti, Thoothukudi District, Tamilnadu, 628503, India.


The amount of information in the scientific literature of the bio-medical domain is growing exponentially, which makes it difficult in developing a smart medical system. Summarization techniques help for efficient searching and understanding of relevant information from the medical documents. In the paper, an evolutionary algorithm based ensemble extractive summarization technique is devised as a smart medical application with the idea of hybrid artificial intelligence on natural language processing. We have considered the abstracts of the target article and its cited articles as the base summaries and a multi-objective evolutionary algorithm is applied for generating the ensemble summary of the target article. Each sentence of the base summaries is represented by a concept vector of the medical terms contained in it with the help of the Unified Modelling Language System (UMLS) tool which is widely used in various smart medical applications. These terms carry the key information of the sentence which is very useful to find out the semantic similarity among the sentences. Fitness functions of the evolutionary algorithm are mainly defined using clustering coefficient and sparsity index, the concepts of graph theory. After the convergence of the algorithm, the best solution of the final population gives the ensemble summary. Next, the semantic similarity of each sentence in the target article with the ensemble summary is calculated and the sentences which are most similar to the ensemble summary are considered as the summary of the target article. The method is applied to the articles available in the PubMed MEDLINE database system and experimental results are compared with some state of the art methods applied in the Bio-medical domain. Experimental results and comparative study based on the performance evaluation show that the method competes with some recently proposed summarization methods and outperforms others, which express the effectiveness of the proposed methodology. Different statistical tests have also been made to observe that the method is statistically significant.