DNA Methylation Sequencing Analysis
  • Introduction
  • Data Preparation
    • Locate the MethPipe Files
    • Download Utilities
    • Download Annotations
    • Annotation File Preparation – Defining Genomic Regions
  • Analysis Work Flow
    • DNA Methylation at Genomic Bins
    • DNA Methylation at CpG Islands
    • DNA Methylation at TFBS
    • DNA Methylation at Various Genic Structure Regions
    • DNA Methylation at Repeat Elements
    • Add CpG Islands Co-localization Information to HMR BED Files
    • Similarity and Differences of HMRs and PMDs from H1 and IMR90
      • HMRs
      • PMDs
  • Visualization Using R
    • Install R Libraries
    • Execute the R Scripts
  • An introduction of UCSC Genome Browser
    • General Usage
    • The Compressed Binary Index Format
Powered by GitBook
On this page

Was this helpful?

  1. Analysis Work Flow

DNA Methylation at Genomic Bins

For each one of the 500 bins, we use the intersectBed program of the bedtools utilities to compare the bins.bed file and meth.bedGraph files base-by-base. The output is piped into groupBy to calculate the number of features being joined (in this case is the methylation values at CpG) and the average methylation level for each bin (i.e. -c 7,7 -o count,mean). Lastly, awk was used to print the methylation levels in four decimal places.

# Console output

# intersectBed
chr1    0       494500  chr1    468     469     0
chr1    0       494500  chr1    470     471     0.666667
chr1    0       494500  chr1    483     484     0.5
chr1    0       494500  chr1    488     489     1
chr1    0       494500  chr1    492     493     0.857143

# groupBy
chr1    0       494500  3973    0.051545
chr1    494500  989000  11539   0.29057
chr1    989000  1483500 19948   0.28879
chr1    1483500 1978000 16014   0.4603
chr1    1978000 2472500 17089   0.47023

# awk
chr1    0       494500  3973    0.0515
chr1    494500  989000  11539   0.2906
chr1    989000  1483500 19948   0.2888
chr1    1483500 1978000 16014   0.4603
chr1    1978000 2472500 17089   0.4702
cd ~/

bsub -q 16G -o stdout -e stderr "intersectBed -a Data/hg18.500bins.bed.gz -b /work3/NRPB1219/hg18_h1_meth.bedGraph -wa -wb | groupBy -i - -g 1-3 -c 7,7 -o count,mean | awk -F $'\t' 'BEGIN { OFS=FS } { print \$1,\$2,\$3,\$4,sprintf(\"%.4f\",\$5) }' > Output/hg18.500bins.h1.meth"

bsub -q 16G -o stdout -e stderr "intersectBed -a Data/hg18.500bins.bed.gz -b /work3/NRPB1219/hg18_imr90_meth.bedGraph -wa -wb | groupBy -i - -g 1-3 -c 7,7 -o count,mean | awk -F $'\t' 'BEGIN { OFS=FS } { print \$1,\$2,\$3,\$4,sprintf(\"%.4f\",\$5) }' > Output/hg18.500bins.imr90.meth"

Use bjobs to check the all jobs have completed and ls to check the files was in the "Output" folder.

ls -la ~/Output/hg18.500bins.*.meth

# Console output

-rw------- 1 s00yao00 s00yao00 737091 2014-12-20 16:26 /home/s00yao00/Output/hg18.500bins.h1.meth
-rw------- 1 s00yao00 s00yao00 737091 2014-12-20 16:26 /home/s00yao00/Output/hg18.500bins.imr90.meth
PreviousAnalysis Work FlowNextDNA Methylation at CpG Islands

Last updated 5 years ago

Was this helpful?