Machine learning

I develop new algorithms and theory for some machine learning methods and solve some genetic and genomic problems by machine learning methods. Specifically, I did some research on penalized regression and clustering.


Methods and applications in genetics and genomics

Integrative analysis of GWAS and other omic data

Recently, new gene-based GWAS tests were proposed to impute gene expression and then test its association with a trait in a GWAS dataset. We propose some more powerful tests by incorporating external information.


Related software:

  • IWAS: Imaging-Wide Association Study;

  • TWAS: Integrating eQTL and GWAS data;

  • aSPUpath2: Integrating eQTL data with GWAS summary statistics in pathway-based analysis.

DNA methylation data analysis

DNA methylation is a widely studied epigenetic mechanism. The Athero-sclerosis Risk in Communities (ARIC) study measures DNA methylation over 480,000 methylation markers obtained from about 3,000 subjects. I develop new methods to solve challenges arising from the ARIC DNA methylation data.


Human microbiome data analysis

A human body has more than 10 times as many microbes living in it as cells. These microorganisms play an important part in our overall health, such as protecting us from diseases and digesting food. I develop new methods to detect an association of human microbiome diversity with a trait.

  • Wu, C., Chen, J., Kim, J., and Pan, W. (2016).
    An adaptive association test for microbiome data. Genome Medicine, 8(1):1–12.
    (IF: 7.1. This paper won the 2016 Joint Statistical Meetings (JSM) Distinguished Student Paper Award on Statistics in Genomics and Genetics Section.)

Other works

Hypothesis tests on high-dimensional parameters in generalized linear models (GLMs)

Regarding hypothesis testing, I have developed a new adaptive test for testing high-dimensional parameters under GLMs with low-dimensional nuisance parameters or with high-dimensional nuisance parameters.

  • Wu, C.*, Xu, G., and Pan, W.* (2017+).
    An adaptive test on high dimensional parameters in generalized linear models. Accepted by Statistica Sinica. (* Corresponding author)

  • Wu, C., Xu, G., Shen, X., and Pan, W. (2017+).
    An adaptive test on a high-dimensional parameter in the presence of a high-dimensional nuisance parameter in GLM with application to detect gene-environment interactions. Manuscript.
    (Job talk manuscript, to be submitted to Journal of the American Statistical Association.)

Collaborative research

For me, collaborative research is very fun and fulfilling experience. I enjoy collaborative research and if you have any interesting applied projects and need some help for analyzing the data, please feel free to contact me.

  • Zhu, L., Li, Y., Chen, Y., Carrera, C., Wu, C., and Fork, A. (2018).
    Comparison between two post-dentin bond strength measurement methods. Scientific Reports, 8(1):2350. (IF: 4.3)

  • Steven Nguyen, S., Guan, W., Wu, C., Grove, M.L., Xia, R., Roetker, N., Holliday, K., Hibler, E., Zheng, Y., Whitsel, E., Bressler, J., North, K.E., Fornage, M., Boerwinkle, E., Pankow, J.S., Demerath, and E.W. (2017+)
    Epigenome-wide association study of moderate-vigorous physical activity in African-American adults. Submitted.

Acknowledgment: research is/was supported by NSF, NIH, and University of Minnesota Doctoral Dissertation Fellowship

All Photos from Google Images