School of Computer, Central China Normal University, Wuhan, Hubei, China; Hubei Key Laboratory of Artificial Intelligence and Smart Learning, Central China Normal University, Wuhan, Hubei, China. Electronic address: [Email]
According to the advances of high-throughput sequencing technology, massive microbiome data accumulated from environmental investigations to human studies. The microbiome-wide association studies are to study the relationship between the microbiome and human health or environment. Recently, Deep Neural Networks (DNNs) are encouraging due to their layer-wise learning ability for representation learning. However, DNNs are considered as black boxes and they require a large amount of training data which makes them impractical to conduct microbiome-wide association studies directly. Meanwhile, the microbiome data is high dimension with many features and noise. A single feature selection method for dealing with the kind of dataset is often unstable. In this work, we introduced a deep learning model named Deep Forest to conduct the microbiome-wide association studies and an ensemble feature selection method is proposed to guide microbial biomarkers' identification. The experiments showed that our ensemble feature method based on Deep Forest had good stability and robustness. The results of feature selection could guide the discovery of microbial biomarkers and help to diagnose microbial-related diseases. The code is available at https://github.com/MicroAVA/MWAS-Biomarkers.git.