Identification of clinically useful genomic and epigenomic variants
2nd International Conference on Big Data Analysis and Data Mining
November 30-December 01, 2015 San Antonio, USA

Xiong Momiao and Long Ma

The University of Texas School of Public Health, USA

Posters-Accepted Abstracts: J Data Mining Genomics Proteomics

Abstract:

Next generation sequencing technologies will generate unprecedentedly massive (thousands or even ten thousands of individuals) and highly-dimensional (up to hundreds of millions) genomic and epigenomic variation data. A fundamental question is how to efficiently extract genomic and epigenomic information of clinical significance. Traditional paradigm for identifying variants of clinical validity is to test association of the variants. However, significantly associated genetic variants may or may not be usefulness for diagnosis and prognosis of diseases. Alternative to association studies for finding genetic variants of predictive utility is to systematically search variants that contain sufficient information for phenotype prediction. To achieve this, we introduce concepts of sufficient dimension reduction which project the original high dimensional data to very low dimensional space while preserving all information on response phenotypes. We then formulate clinically significant genetic and epigenetic variant discovery problem into sparse SDR problem and develop algorithms that can select significant genetic variants from up to or even ten millions of predictors with the aid of dividing SDR for whole genome into a number of sub-SDR problems defined for genomic regions. The sparse SDR is in turn formulated as sparse optimal scoring problem. To speed up computation, we apply the alternating direction method for multipliers to solving the sparse optimal scoring problem which can easily be implemented in parallel. To illustrate its application, the proposed method is applied to the TCGA overall cancer dataset.

Biography :

Xiong Momiao completed his Ph.D. in 1993 and postdoctoral studies from University of Southern California. He is a Professor in the Division of Biostatistics and Human Genetics Center at the University of Texas School of Public Health. He has published more than 100 papers and served as editorial board member for a number of Journals.