Linux

awk regrex pattern

In SNPTEST output, the header get “#” and “alternate_ids” which I don’t want to process. with awk, it is quite simple to skip these lines with regrex: cat snptest.out | awk ‘!/#|alternate_ids {print $0}’   OFS=$”\t” can use tab as output seperator NR line number, if you know how many lines you don’t want in… Continue reading awk regrex pattern

Linux

Download 1000 genome using Aspera

1. download aspera browser plugin and install 2. default in linux, it creates ~/.aspera/ 3.~/.aspera/connect/bin/ascp -i ~/.aspera/connect/etc/asperaweb_id_dsa.openssh -Tr -Q -l 100M -L- fasp-g1k@fasp.1000genomes.ebi.ac.uk:vol1/ftp/phase3/data/HG00133/sequence_read/SRR035484_1.filt.fastq.gz ./fastq/ change 100M to 300M will increase the download speed from 100M/s to 300M/s. In average, it is 97M/s to 297M/s.

GWAS

PCA with shellfish

Assuming  shellfish and all other related software have been installed correctly. Assuming shellfish.py exists then prepare a pbs script, here I called it Shellfish.pbs and I have plink files calledABC.bim ABC.bed ABC.fam cat Shellfish.pbs #!/bin/bash #PBS -N shellfish #PBS -S /bin/bash #PBS -j oe #PBS -l walltime=24:00:00 #PBS -l ncpus=20 #PBS -l mem=100G hostname cd… Continue reading PCA with shellfish

R

R X11 font -adobe-helvetica-%s-%s-*-*-%d-*-*-*-*-*-*-*, face 1 at size 12 could not be loaded

Recently I came across a problem in R (3.2.3 and 3.2.4). When I type in R: plot(1:5, 1:5) X11 font -adobe-helvetica-%s-%s-*-*-%d-*-*-*-*-*-*-*, face 1 at size 12 could not be loaded and I missed all the x-axis and y-axis labels. My R sessionInfo() is R version 3.2.3 (2015-12-10) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Red Hat Enterprise… Continue reading R X11 font -adobe-helvetica-%s-%s-*-*-%d-*-*-*-*-*-*-*, face 1 at size 12 could not be loaded

GWAS

Genomic inflation factor calculation

In GWAS, a common way to investigate if there are any systematic biases that may be present in your association results  is to calculate the genomic inflation factor, also known as lambda gc (λgc). The genomic inflation factor λgc  is defined as the ratio of the median of the empirically observed distribution of the test statistic… Continue reading Genomic inflation factor calculation