Improving sentiment analysis on clinical narratives by exploiting UMLS semantic types.

Affiliation

Sanglerdsinlapachai N(1), Plangprasopchok A(2), Ho TB(3), Nantajeewarawat E(4).
Author information:
(1)National Electronics and Computer Technology Center, Pathumthani, Thailand; Japan Advanced Institute of Science and Technology, Ishikawa, Japan; Sirindhorn International Institute of Technology, Thammasat University, Pathumthani, Thailand.
(2)National Electronics and Computer Technology Center, Pathumthani, Thailand.
(3)Japan Advanced Institute of Science and Technology, Ishikawa, Japan; John von Neumann Institute, Vietnam National University, Ho Chi Minh City, Vietnam.
(4)Sirindhorn International Institute of Technology, Thammasat University, Pathumthani, Thailand. Electronic address: [Email]

Abstract

Sentiments associated with assessments and observations recorded in a clinical narrative can often indicate a patient's health status. To perform sentiment analysis on clinical narratives, domain-specific knowledge concerning meanings of medical terms is required. In this study, semantic types in the Unified Medical Language System (UMLS) are exploited to improve lexicon-based sentiment classification methods. For sentiment classification using SentiWordNet, the overall accuracy is improved from 0.582 to 0.710 by using logistic regression to determine appropriate polarity scores for UMLS 'Disorders' semantic types. For sentiment classification using a trained lexicon, when disorder terms in a training set are replaced with their semantic types, classification accuracies are improved on some data segments containing specific semantic types. To select an appropriate classification method for a given data segment, classifier combination is proposed. Using classifier combination, classification accuracies are improved on most data segments, with the overall accuracy of 0.882 being obtained.