Download Annotations

For this workshop, we have also downloaded all relevant annotation files from public databases (UCSC Genome Browser and Gencode) and placed them in the /work3/NRPB1219 folder.

# Console output

-rw-r--r-- 1 s00yao00 s00yao00 2.2K 2008-09-05 18:13 /work3/NRPB1219/chromInfo.txt
-rw-r--r-- 1 s00yao00 s00yao00 1.6M 2006-04-14 02:39 /work3/NRPB1219/cpgIslandExt.txt
-rw-r--r-- 1 s00yao00 s00yao00 653M 2014-12-16 16:01 /work3/NRPB1219/gencode.v3c.annotation.NCBI36.gtf
-rw-r--r-- 1 s00yao00 s00yao00 289M 2014-10-20 02:54 /work3/NRPB1219/wgEncodeRegTfbsClustered.txt

For reader who do not have an account on the ALPS server, please download and uncompressed these files.

wget ftp://ftp.sanger.ac.uk/pub/gencode/release_3c/gencode.v3c.annotation.NCBI36.gtf.gz
wget http://hgdownload.soe.ucsc.edu/goldenPath/hg18/database/chromInfo.txt.gz
wget http://hgdownload.soe.ucsc.edu/goldenPath/hg18/database/cpgIslandExt.txt.gz    
wget http://hgdownload.soe.ucsc.edu/goldenPath/hg18/database/wgEncodeRegTfbsClustered.txt.gz
gunzip gencode.v3c.annotation.NCBI36.gtf.gz chromInfo.txt.gz cpgIslandExt.txt.gz wgEncodeRegTfbsClustered.txt.gz

The Repeating Elements information provided the by UCSC Genome Browser was split into individual chromosomes, i.e. one data file per one chromosome. We will use a shell script to demonstrate batch download and joining of the data files into one single file for ease manipulation.

cd ~/Data
wget --no-check-certificate https://raw.githubusercontent.com/ycl6/MethylationWorkshop2014/master/download_rmsk.sh
sh download_rmsk.sh

Use ls to check the file was in the "Data" folder.

ls -la ~/Data/hg18.rmskRM327.bed.gz

# Console output

-rw------- 1 s00yao00 s00yao00 52052411 2014-12-20 19:39 /home/s00yao00/Data/hg18.rmskRM327.bed.gz

Last updated