DNA Methylation at Genomic Bins
For each one of the 500 bins, we use the intersectBed
program of the bedtools utilities to compare the bins.bed
file and meth.bedGraph
files base-by-base. The output is piped into groupBy
to calculate the number of features being joined (in this case is the methylation values at CpG) and the average methylation level for each bin (i.e. -c 7,7 -o count,mean
). Lastly, awk
was used to print the methylation levels in four decimal places.
# Console output
# intersectBed
chr1 0 494500 chr1 468 469 0
chr1 0 494500 chr1 470 471 0.666667
chr1 0 494500 chr1 483 484 0.5
chr1 0 494500 chr1 488 489 1
chr1 0 494500 chr1 492 493 0.857143
# groupBy
chr1 0 494500 3973 0.051545
chr1 494500 989000 11539 0.29057
chr1 989000 1483500 19948 0.28879
chr1 1483500 1978000 16014 0.4603
chr1 1978000 2472500 17089 0.47023
# awk
chr1 0 494500 3973 0.0515
chr1 494500 989000 11539 0.2906
chr1 989000 1483500 19948 0.2888
chr1 1483500 1978000 16014 0.4603
chr1 1978000 2472500 17089 0.4702
cd ~/
bsub -q 16G -o stdout -e stderr "intersectBed -a Data/hg18.500bins.bed.gz -b /work3/NRPB1219/hg18_h1_meth.bedGraph -wa -wb | groupBy -i - -g 1-3 -c 7,7 -o count,mean | awk -F $'\t' 'BEGIN { OFS=FS } { print \$1,\$2,\$3,\$4,sprintf(\"%.4f\",\$5) }' > Output/hg18.500bins.h1.meth"
bsub -q 16G -o stdout -e stderr "intersectBed -a Data/hg18.500bins.bed.gz -b /work3/NRPB1219/hg18_imr90_meth.bedGraph -wa -wb | groupBy -i - -g 1-3 -c 7,7 -o count,mean | awk -F $'\t' 'BEGIN { OFS=FS } { print \$1,\$2,\$3,\$4,sprintf(\"%.4f\",\$5) }' > Output/hg18.500bins.imr90.meth"
Use bjobs
to check the all jobs have completed and ls
to check the files was in the "Output" folder.
ls -la ~/Output/hg18.500bins.*.meth
# Console output
-rw------- 1 s00yao00 s00yao00 737091 2014-12-20 16:26 /home/s00yao00/Output/hg18.500bins.h1.meth
-rw------- 1 s00yao00 s00yao00 737091 2014-12-20 16:26 /home/s00yao00/Output/hg18.500bins.imr90.meth
Last updated
Was this helpful?