Prediction of an outcome using NETwork Clusters (NET-C).

Affiliation

Lee JW(1), Zhou J(2), Moen EL(2), Punshon T(3), Hoen AG(4), Romano ME(4), Karagas MR(5), Gui J(6).
Author information:
(1)Institute for Quantitative Biomedical Sciences, Dartmouth College, Hanover, NH.
(2)Department of Biomedical Data Science, Geisel School of Medicine, Lebanon, NH.
(3)Department of Biological Sciences, Dartmouth College, Hanover, NH.
(4)Department of Epidemiology, Geisel School of Medicine, Lebanon, NH.
(5)Department of Epidemiology, Geisel School of Medicine, Lebanon, NH. Electronic address: [Email]
(6)Department of Biomedical Data Science, Geisel School of Medicine, Lebanon, NH. Electronic address: [Email]

Abstract

Birth weight is a key consequence of environmental exposures and metabolic alterations and can influence lifelong health. While a number of methods have been used to examine associations of trace element (including essential nutrients and toxic metals) concentrations or metabolite concentrations with a health outcome, birth weight, studies evaluating how the coexistence of these factors impacts birth weight are extremely limited. Here, we present a novel algorithm NETwork Clusters (NET-C), to improve the prediction of outcome by considering the interactions of features in the network and then apply this method to predict birth weight by jointly modelling trace element and cord blood metabolite data. Specifically, by using trace element and/or metabolite subnetworks as groups, we apply group lasso to estimate birth weight. We conducted statistical simulation studies to examine how both sample size and correlations between grouped features and the outcome affect prediction performance. We showed that in terms of prediction error, our proposed method outperformed other methods such as (a) group lasso with groups defined by hierarchical clustering, (b) random forest regression and (c) neural networks. We applied our method to data ascertained as part of the New Hampshire Birth Cohort Study on trace elements, metabolites and birth outcomes, adjusting for other covariates such as maternal body mass index (BMI) and enrollment age. Our proposed method can be applied to a variety of similarly structured high-dimensional datasets to predict health outcomes.