See Google Scholar for all my publications. *: corresponding authors.

Machine learning and high-dimensional inference

We develop new algorithms and theories for machine learning methods (penalized regression and clustering) and high-dimensional inference. We further apply them to genetic and genomic datasets.


Related software:

  • prclust: Penalized Regression-Based Clustering Method;

  • aispu: adaptive interaction sum of powered score (aiSPU) test for testing high-dimensional parameters under generalized linear models (GLMs) with high-dimensional nuisance parameters.

  • GLMaSPU: An adaptive test for testing high-dimensional parameters under generalized linear models (GLMs) with low-dimensional nuisance parameters.

Statistical Genetics/Genomics

Integrative analysis of GWAS and multi-omics data

Transcriptome-wide association studies (TWAS) integrate expression quantitative trait loci (eQTL) data with disease genome-wide association study (GWAS) results to discover gene-trait associations. TWAS has garnered substantial interest and has been used widely to identify many novel trait-associated genes. Following TWAS, we propose several novel methods in this field to integrate gene expression, enhancer-promoter interactions, and brain imaging with GWAS results.


Related software:

  • IWAS: Imaging-Wide Association Study;

  • TWAS-aSPU: Integrating eQTL and GWAS data;

  • aSPUpath2: Integrating eQTL data with GWAS summary statistics in pathway-based analysis.

  • FOGS: FOGS is a powerful fine-mapping method that prioritizes putative causal genes by accounting for local LD in TWAS results

DNA methylation data integration

DNA methylation is a widely studied epigenetic mechanism. The Athero-sclerosis Risk in Communities (ARIC) study measures DNA methylation over 480,000 methylation markers obtained from about 3,000 subjects. I develop new methods to solve challenges arising from the ARIC DNA methylation data.


Related software:

  • CMO: Cross Methylome Omnibus (CMO) integrates genetically regulated DNAm in enhancers, promoters, and the gene body to identify additional disease-associated genes.

Human microbiome data analysis

A human body has more than ten times as many microbes living in it as cells. These microorganisms play an important part in our overall health, such as protecting us from diseases and digesting food. We develop new methods to test an association of human microbiome diversity with a trait of interest.

  • Wu, C., Chen, J., Kim, J., and Pan, W. (2016).
    An adaptive association test for microbiome data. Genome Medicine, 8(1):1–12.
    (This paper won the 2016 Joint Statistical Meetings (JSM) Distinguished Student Paper Award on Statistics in Genomics and Genetics Section.)

Related software:

  • MiSPU: Microbiome Based Sum of Powered Score (MiSPU) Tests.

Applied studies


Other works

Collaborative research

I am involved in many applied studies and enjoy collaborative research. Specifically, I have collaborated with epidemiologists and developed a series of papers studying the genetic basis of prostate cancer, pancreatic cancer, Alzheimer's disease, and COVID-19.


Acknowledgment: My research is/was supported by NIH, seeds grants at Florida State University, and the University of Minnesota Doctoral Dissertation Fellowship.

All Photos from Google Images