IWAS

Imaging-Wide Association Study

IWAS is a new tool for performing an image-wide association study by incorporating imaging endophenotypes as weights into GWAS using either individual level data or only summary statistics. The goal is to identify new genes that are associated with a trait of interest in GWAS. We provide precomputed external weights (based on gray matter volumes as the endophenotype) from the ADNI data to facilitate analysis. Please cite the following manuscript for using the IWAS method & data:

Xu, Z., Wu, C., Pan, W., and Alzheimer’s Disease Neuroimaging Initiative (ADNI). (2017). Imaging-wide association study: integrating imaging endophenotypes in GWAS. NeuroImage, 159:159–169.

For questions or comments regarding methods, contact Wei Pan (weip@biostat.umn.edu); For questions or comments regarding data & codes, contact Chong Wu (chongwu@umn.edu).

Installation

• Confiugre the necessary environment and install plink.

git clone https://github.com/ChongWu-Biostat/IWAS.git

• Download the reference data, for example, 1000 Genomes, HapMap. Here we provide HapMap 3 with independent European reference. Use this link to download. Put it into the same working directory.

• Launch R and install required libraries:

if (!require("devtools")) {
install.packages("devtools")
library(devtools)
}

devtools::install_github("ChongWu-Biostat/aSPU2")


Typical analysis and output

The IWAS analysis takes imaging endophenotype based external weights and disease GWAS summary statistics to identify significant genes.

We will use the IGAP Alzheimer's summary data (Lambert et al. 2013) as an example to illustrate how to use our methods. This example assumes you have setup the required environment and data as illustrated in the previous section. All analyses are based on R/3.3.1.

Input: GWAS summary statistics

At a minimum, we need a summary rds file with a header row containing the following fields:

1. SNP_map – SNP identifier (CHR:BP)

2. A1 – first allele (effect allele, should be capitalized)

3. A2 – second allele (other allele, should be capitalized)

4. Z – Z-scores, sign with respect to A1.

Additional columns are allowed but will be ignored. We recommend removing the additional columns before analysis to save space.

Note: The performance of IWAS depends on the density of summary-level data. We highly recommend running IWAS with raw summary-level data. Pre-process step such as pruning and restricting to top SNPs may harm the performance.

Input: external weights

We pre-computed the external weights based on ADNI-1 or ADNI GO2 data. For each gene, in addition to its coding region, all variants within the 1 MB upstream and downstream regions of the transcription startending sites were included. The external weights are stored in the .WEIGHTS which provides the external weights for each gene, and their corresponding physical positions.

Performing the IWAS

After we prepared the data, we can run IWAS via the following single line.

Rscript IWAS.R \
--sumstats ./Example/IGAP_chr22.rds \
--out ./Example/example_res.rds \
--ref_ld ./LDREF/hapmap_CEU_r23a_hg19 \
--gene_list ./Example/gene_list.txt \
--test_type aSPU \
--weight_type ST31CV


This should take around one or two minutes, and you will see some intermediate steps on the screen. If everything works, you will see a file example_res.rds under the .Example and output.txt in the working directory.

Through IWAS.R, we perform the following steps: (1) combine the GWAS and reference SNPs and remove ambiguous SNPs. (2) impute GWAS Z-scores for any reference SNPs with missing GWAS Z-scores via the IMPG algorithm ([https:academic.oup.combioinformaticsarticle302029062422225/Fast-and-accurate-imputation-of-summary-statistics Pasaniuc et al. 2014); (3) perform IWAS; (4) report results and store them in the working directory.

Output: Gene-disease association

The results are stored in a user-defined output file. For illustration, we explain the meaning of each entry in the first two lines of the output.

 Col. num. Column name Value Explanations 1 gene A4GALT Feature/gene identifier, taken from gene_list file 2 CHR 22 Chromosome 3 P0 41088126 Gene start (from hg19 list from plink website) 4 P1 43116876 Gene end (from hg19 list from plink website) 5 #nonzero_SNPs 5 Number of non-zero weight SNPs 6 TWAS_asy 0.69 TWAS p-value with imaging based external weight. The p-value is based on asymptotic distribution. 7 SSU_asy 0.61 SSU p-value with imaging based external weight. The p-value is based on asymptotic distribution. 8-23 SPU 0.69 SPU or aSPU p-values. The results are based on simulations.

Note: We only store the results for genes with external weights. The genes without external weights will be ignored in the output file.

External Weights

We put some pre-computed external weights on GitHub. You can compute your own weights as well. To facilitate this step, we put our source file into the folder .Weight/conWeights.R. Our method is based on elastic net, and you can use other methods to compute the weights. Currently, we provide the pre-computed weights for the following 14 imaging endophenotypes. You can define which image endophenotype you want to use by weight_type options. The following table describes the values and corresponding meanings for --weight_type option.

 left right ROI name ST31CV ST90CV Inferior parietal ST32CV ST91CV Inferior temporal ST39CV ST98CV Medial orbitofrontal ST44CV ST103CV Parahippocampal ST52CV ST111CV Precuneus T50CV ST109CV Posterior cingulate ST29SV ST88SV Hippocampus

Further Analyses

Testing for effect in multiple external weights

There may be compelling reasons to take advantage of multiple sets of weights based on multiple correlated endophenotypes. First, the statistical advantages of joint analysis of multiple traits include possibly increasing statistical power and more precise parameter estimates, alleviating the burden of multiple testing. Biologically, joint analysis of multiple traits addresses the issue of pleiotropy (i.e. one locus influencing multiple traits), providing biological insight into molecular mechanisms underlying the disease or trait. Second, the above conclusions are expected to carry over to the current context of analysis of multiple endophenotypes. In our current version of the software, we provide an adaptive test, called daSPU, which can combine the information from multiple sets of weights simultaneously. To use daSPU, we can simply set --test_type option to daSPU.

Testing with the individual level GWAS data

IWAS can be applied to individual level GWAS data as well. The main idea is the same. Specifically, we can calculate the score and its corresponding covariance matrix for SNPs in the gene regions. Then we can apply aSPU2 package with pre-computed imaging endophenotype based external weights provided on this website. Since there some restrictions sharing individual level GWAS data, we do not provide any example regarding applying IWAS to individual level data. However, this should be a relatively easy task.

Command-line parameters

IWAS.R

 Flag Usage Default --sumstats ummary statistics (rds file and must have SNP and Z column headers) Required --out Path to output file Required --weights File listing molecular weight (rds files and must have columns ID,CHR,P0,P1, Weights) Required --ref_ld Reference LD files in binary PLINK format Required --gene_list Gene sets we want to analyze, currently only gene sets from a single chromosome are supported Required --test_type Test we want to perform aSPU --weight_type Weight we want to use ST31TA --Rsq_cutoff R square cutoff for genes we want to analyze 0.01 --max_nperm maximum number of permutation for aSPU or daSPU 1000000

note 1: We only want to analyze the genes with relatively informative external weights. To do that, we calculated the squared Person correlation, $$r^2$$, between the predicted and observed endophenotype values in the dataset, and selected only those genes with $$r^2 > 0.01$$ (--Rsq_cutoff). To facility your analyses when you are constructing your own weights, we make the cutoff as an option.

note 2: A single layer/loop of Monte Carlo simulations is used to obtain the $$p$$-values of all the SPU, aSPU, and daSPU tests simultaneously. we use an adaptive way to select the number of simulations and calculate $$p$$-values efficiently. --max_nperm is the upper bound for the number of simulations.

FAQ

IWAS looks similar to TWAS and PrediXcan. What's the difference between them?

Differing from that TWAS (Gusev et al. 2016) and PrediXcan (Gamazon et al. 2015) use gene expression as the endophenotype, IWAS uses imaging endophenotypes, which might be more powerful when analyzing some neuro degenerative or psychiatric disorders, such as AD. By noting that TWAS and PrediXcan are the same as a weighted Sum test with gene expression based weights, we propose to use aSPU, a more powerful and adaptive test, to conduct association testing. Since aSPU covers the (weighted) Sum test as a special case, we can get the results for TWAS or PrediXcan after running our method as well. We demonstrate that our new method is more powerful and identify some new genes in some real data analyses. We expect this to be generally true and hope you can let us know your results if you apply both TWAS and IWAS on your real data analyses.

What related software is available?

Our methods are related to the following method by other two groups. Two methods are highly correlated with two well-known groups. MetaXcan and PrediXcan (Gamazon et al 2015) developed by the Im lab perform gene-based association tests with and without summary data. TWAS by Gusev performs gene-based association tests with individual-level or summary statistics. MetaXcan and TWAS are similar. The main difference between TWAS and MetaXcan is that they use a slightly different way and different reference data to construct weights. We recommend you to check their websites to download other gene-expression based weights as well.

Can I use other reference data?

Yes. Other references such as 1000 Genomes, can be used as well. We will upload the 1000 Genomes with only HapMap SNPs set in the near future. Please check this website later for updating.

What QC is performed internally when using IWAS?

IWAS performs a similar quality control steps as TWAS did. We automatically match up SNPs, remove ambiguous markers (A/C or G/T) and flip alleles to match the reference data.

Acknowledgements

This research was supported by NIH grants R21AG057038, R01HL116720, R01GM113250 and R01HL105397, and by the Minnesota Supercomputing Institute; ZX was supported by a University of Minnesota MnDRIVE Fellowship and CW by a University of Minnesota Dissertation Fellowship.

We are grateful to the external reference data, and all of the hard work involved in the study. Without the efforts of these groups and sharing the individual level data, we are unable to construct the external weights, and our methods will be useless.

Data collection and sharing for this project was funded by the Alzheimer’s Disease Neuroimaging Initiative (ADNI) (National Institutes of Health Grant U01 AG024904) and DOD ADNI (Department of Defense award number W81XWH-12-2-0012). ADNI is funded by the National Institute on Aging, the National Institute of Biomedical Imaging and Bioengineering, and through generous contributions from the following: Alzheimer’s Association; Alzheimers Drug Discovery Foundation; Araclon Biotech; BioClinica, Inc.; Biogen Idec Inc.; Bristol-Myers Squibb Company; Eisai Inc.; Elan Pharmaceuticals, Inc.; Eli Lilly and Company; EuroImmun; F. Ho↵mann-La Roche Ltd and its a liated company Genentech, Inc.; Fujirebio; GE Healthcare; IXICO Ltd.; Janssen Alzheimer Immunotherapy Research & Development, LLC.; Johnson & Johnson Pharmaceutical Research & Development LLC.; Medpace, Inc.; Merck & Co., Inc.; Meso Scale Diagnostics, LLC.; NeuroRx Research; Neurotrack Technologies; Novartis Pharmaceuticals Corporation; Pfizer Inc.; Piramal Imaging; Servier; Synarc Inc.; and Takeda Pharmaceutical Company. The Canadian Institutes of Rev December 5, 2013 Health Research is providing funds to support ADNI clinical sites in Canada. Private sector contributions are facilitated by the Foundation for the National Institutes of Health (www.fnih.org). The grantee organization is the Northern California Institute for Research and Education, and the study is coordinated by the Alzheimer’s Disease Cooperative Study at the University of California, San Diego. ADNI data are disseminated by the Laboratory for Neuro Imaging at the University of Southern California.

References

• Lambert, JC., Ibrahim-Verbaas, CA., Harold, D., Naj, AC., Sims, R., Bellenguez, C., DeStafano, AL., Bis, JC., Beecham, GW., Grenier-Boley, B., et al. (2013) Meta-analysis of 74,046 individuals identifies 11 new susceptibility loci for Alzheimer’s disease. Nat. Genet. 45, 1452–1460.

• Pasaniuc, B., Zaitlen, N., Shi, H., Bhatia, G., Gusev, A., Pickrell, J., Hirschhorn, J., Strachan, D.P., Patterson, N. and Price, A.L. (2014). Fast and accurate imputation of summary statistics enhances evidence of functional enrichment. Bioinformatics 30, 2906- 2914.

• Gusev, A. et al. (2016) Integrative approaches for large-scale transcriptome-wide association studies. Nat Genet. 48, 245-252.

• Gamazon, E.R. et al. (2015) A gene-based association method for mapping traits using reference transcriptome data. Nat. Genet. 47, 1091-1098.

• Alvaro Barbeira, Kaanan P Shah, Jason M Torres, Heather E Wheeler, Eric S Torstenson, Todd Edwards, Tzintzuni Garcia, Graeme I Bell, Dan Nicolae, Nancy J Cox, Hae Kyung Im. (2016) MetaXcan: Summary Statistics Based Gene-Level Association Method Infers Accurate PrediXcan Results.