GWAS

Genomic inflation factor calculation

In GWAS, a common way to investigate if there are any systematic biases that may be present in your association results  is to calculate the genomic inflation factor, also known as lambda gc (λgc).

The genomic inflation factor λgc  is defined as the ratio of the median of the empirically observed distribution of the test statistic to the expected median, thus quantifying the extent of the bulk inflation and the excess false positive rate. In other words, λgc is defined as the median of the resulting chi-squared test statistics divided by the expected median of the chi-squared distribution. The median of a chi-squared distribution with one degree of freedom is 0.4549364 (qchisq(0.5,1) in R). A λgc value can be calculated from z-scores, chi-square statistics, or p-values, depending on the output you have from the association analysis.

This program will calculate the λ and λ1000 for  logistic regression output generated by plink and make the qqplot

args <- commandArgs(trailingOnly = TRUE)
file.name <- args[1]
# the prefix of the assoc.logistic file and the fam file
# here suppose prefix.fam prefix.assoc.logistic

fam.file  <- read.table(paste(file.name, 'fam', sep='.'), stringsAsFactors=F)
case <- sum(fam.file[,6] == 2)
control <-  sum(fam.file[,6] == 1)

assoc.file <- paste(file.name, 'assoc.logistic', sep='.')
assoc.df <- read.table(assoc.file, header=T, stringsAsFactors=F)
# non-factor is important
assoc.df = assoc.df[complete.cases(assoc.df), ]
# remove NA lines
assoc.df$CHISQ <- qchisq(assoc.df$P,1,lower.tail=FALSE)
# qchisq(assoc.df$P,1,lower.tail=FALSE) can convert p-value (even < 5.5e-17) to chisq
# while qchisq(1-assoc.df$P,1) fails to convert small p-value (Inf in this case) 

lambda <- median(chisq) / qchisq(0.5,1)
lambda1000 <- 1 + (lambda - 1) * (1 / case + 1 / control) * 500

# qchisq(0.5,1) will give 0.4549364
# some people directly / 0.454 which may cause slightly different result
# chisq = z**2
# z <- qnorm(p-value/2)

#################### qqplot #####################
# there are various way to make qqplot, the quickest way is qqman package with less controls
# qq(assoc.df$P)

# draw qqplot step by step
m <- length(assoc.df$P)
#x <- rev(qchisq(1-(1:m)/(m+1), 1))</pre>
<pre>x <- rev(qchisq(1-(1:m)/(m), 1))</pre>
<pre># small to large y <- assoc.df$CHISQ # small to large qqplot(x, y, xlab="Expected chi-squared value", ylab="Observed chi-squared value",        main = "qq-plot",        sub = bquote(lambda ~ '=' ~ .(signif(lambda, 4)) ~~ lambda[1000] ~ '=' ~.(signif(lambda1000, 4))),        cex.lab=1.2,        cex.axis=1.2,        font.lab=2,        font.axis=2,        lwd=1.3        ) # plot() will work also abline(0, 1, col='red') <code>

 

qqplot.png

One thought on “Genomic inflation factor calculation

Leave a comment