DNA Methylation Sequencing Analysis
  • Introduction
  • Data Preparation
    • Locate the MethPipe Files
    • Download Utilities
    • Download Annotations
    • Annotation File Preparation – Defining Genomic Regions
  • Analysis Work Flow
    • DNA Methylation at Genomic Bins
    • DNA Methylation at CpG Islands
    • DNA Methylation at TFBS
    • DNA Methylation at Various Genic Structure Regions
    • DNA Methylation at Repeat Elements
    • Add CpG Islands Co-localization Information to HMR BED Files
    • Similarity and Differences of HMRs and PMDs from H1 and IMR90
      • HMRs
      • PMDs
  • Visualization Using R
    • Install R Libraries
    • Execute the R Scripts
  • An introduction of UCSC Genome Browser
    • General Usage
    • The Compressed Binary Index Format
Powered by GitBook
On this page

Was this helpful?

  1. Data Preparation

Download Annotations

For this workshop, we have also downloaded all relevant annotation files from public databases (UCSC Genome Browser and Gencode) and placed them in the /work3/NRPB1219 folder.

# Console output

-rw-r--r-- 1 s00yao00 s00yao00 2.2K 2008-09-05 18:13 /work3/NRPB1219/chromInfo.txt
-rw-r--r-- 1 s00yao00 s00yao00 1.6M 2006-04-14 02:39 /work3/NRPB1219/cpgIslandExt.txt
-rw-r--r-- 1 s00yao00 s00yao00 653M 2014-12-16 16:01 /work3/NRPB1219/gencode.v3c.annotation.NCBI36.gtf
-rw-r--r-- 1 s00yao00 s00yao00 289M 2014-10-20 02:54 /work3/NRPB1219/wgEncodeRegTfbsClustered.txt

For reader who do not have an account on the ALPS server, please download and uncompressed these files.

wget ftp://ftp.sanger.ac.uk/pub/gencode/release_3c/gencode.v3c.annotation.NCBI36.gtf.gz
wget http://hgdownload.soe.ucsc.edu/goldenPath/hg18/database/chromInfo.txt.gz
wget http://hgdownload.soe.ucsc.edu/goldenPath/hg18/database/cpgIslandExt.txt.gz    
wget http://hgdownload.soe.ucsc.edu/goldenPath/hg18/database/wgEncodeRegTfbsClustered.txt.gz
gunzip gencode.v3c.annotation.NCBI36.gtf.gz chromInfo.txt.gz cpgIslandExt.txt.gz wgEncodeRegTfbsClustered.txt.gz

The Repeating Elements information provided the by UCSC Genome Browser was split into individual chromosomes, i.e. one data file per one chromosome. We will use a shell script to demonstrate batch download and joining of the data files into one single file for ease manipulation.

cd ~/Data
wget --no-check-certificate https://raw.githubusercontent.com/ycl6/MethylationWorkshop2014/master/download_rmsk.sh
sh download_rmsk.sh

Use ls to check the file was in the "Data" folder.

ls -la ~/Data/hg18.rmskRM327.bed.gz

# Console output

-rw------- 1 s00yao00 s00yao00 52052411 2014-12-20 19:39 /home/s00yao00/Data/hg18.rmskRM327.bed.gz
PreviousDownload UtilitiesNextAnnotation File Preparation – Defining Genomic Regions

Last updated 5 years ago

Was this helpful?