Deep learning-based auto-delineation of gross tumour volumes and involved nodes in PET/CT images of head and neck cancer patients.


Moe YM(1), Groendahl AR(1), Tomic O(1), Dale E(2), Malinen E(3)(4), Futsaether CM(5).
Author information:
(1)Faculty of Science and Technology, Norwegian University of Life Sciences, Ås, Norway.
(2)Department of Oncology, Oslo University Hospital, Oslo, Norway.
(3)Department of Medical Physics, Oslo University Hospital, Oslo, Norway.
(4)Department of Physics, University of Oslo, Oslo, Norway.
(5)Faculty of Science and Technology, Norwegian University of Life Sciences, Ås, Norway. [Email]


PURPOSE: Identification and delineation of the gross tumour and malignant nodal volume (GTV) in medical images are vital in radiotherapy. We assessed the applicability of convolutional neural networks (CNNs) for fully automatic delineation of the GTV from FDG-PET/CT images of patients with head and neck cancer (HNC). CNN models were compared to manual GTV delineations made by experienced specialists. New structure-based performance metrics were introduced to enable in-depth assessment of auto-delineation of multiple malignant structures in individual patients. METHODS: U-Net CNN models were trained and evaluated on images and manual GTV delineations from 197 HNC patients. The dataset was split into training, validation and test cohorts (n= 142, n = 15 and n = 40, respectively). The Dice score, surface distance metrics and the new structure-based metrics were used for model evaluation. Additionally, auto-delineations were manually assessed by an oncologist for 15 randomly selected patients in the test cohort. RESULTS: The mean Dice scores of the auto-delineations were 55%, 69% and 71% for the CT-based, PET-based and PET/CT-based CNN models, respectively. The PET signal was essential for delineating all structures. Models based on PET/CT images identified 86% of the true GTV structures, whereas models built solely on CT images identified only 55% of the true structures. The oncologist reported very high-quality auto-delineations for 14 out of the 15 randomly selected patients. CONCLUSIONS: CNNs provided high-quality auto-delineations for HNC using multimodality PET/CT. The introduced structure-wise evaluation metrics provided valuable information on CNN model strengths and weaknesses for multi-structure auto-delineation.