DNA Methylation at Repeat Elements

The BED file containing the repeat element annotations (prepared in Chapter 1.3) was located in the Data folder. In the hg18 version of the RepMask 3.2.7 annotation, the repeat elements were categorized into 21 repeat classes (see Table 2). In this demonstration, we will calculate an compare the methylation levels of five types common repeats: SINE, LINE, LTR, Satellite and DNA.

Table 2. Number of features in each repeat class in the RepMask 3.2.7 annotation (build hg18)

No. of Entries

Repeat Classes

1757823

SINE

1468898

LINE

699087

LTR

454482

DNA

407205

Simple_repeat

364098

Low_complexity

6998

Unknown

6096

Satellite

4251

snRNA

3589

Other

2204

RC

1871

DNA?

1751

tRNA

1715

rRNA

1437

srpRNA

1296

scRNA

715

RNA

417

SINE?

123

LTR?

93

Unknown?

52

LINE?

Like before, we use intersectBed and groupBy commands to calculate the average methylation values at each repeat element from the five repeat classes.

# Console output

# intersectBed
chr1    468     1310    Satellite       0       -       telo    TAR1    chr1    468     469     0
chr1    468     1310    Satellite       0       -       telo    TAR1    chr1    470     471     0.666667
chr1    468     1310    Satellite       0       -       telo    TAR1    chr1    483     484     0.5
chr1    468     1310    Satellite       0       -       telo    TAR1    chr1    488     489     1
chr1    468     1310    Satellite       0       -       telo    TAR1    chr1    492     493     0.857143

# groupBy
chr1    468     1310    Satellite       89      0.21305
chr1    1540    1643    DNA     1       0
chr1    5128    5208    SINE    1       0
chr1    8769    8911    LINE    2       0
chr1    9877    10268   LINE    8       0
cd ~/

bsub -q 16G -o stdout -e stderr "intersectBed -a Data/hg18.rmskRM327.bed.gz -b /work3/NRPB1219/hg18_h1_meth.bedGraph -wa -wb | grep -Pw \"Satellite|DNA|LTR|LINE|SINE\" | groupBy -i - -g 1-4 -c 12,12 -o count,mean | awk -F $'\t' 'BEGIN { OFS=FS } { print \$1,\$2,\$3,\$4,\$5,sprintf(\"%.4f\",\$6) }' > Output/hg18.rmskRM327.h1.meth"

bsub -q 16G -o stdout -e stderr "intersectBed -a Data/hg18.rmskRM327.bed.gz -b /work3/NRPB1219/hg18_imr90_meth.bedGraph -wa -wb | grep -Pw \"Satellite|DNA|LTR|LINE|SINE\" | groupBy -i - -g 1-4 -c 12,12 -o count,mean | awk -F $'\t' 'BEGIN { OFS=FS } { print \$1,\$2,\$3,\$4,\$5,sprintf(\"%.4f\",\$6) }' > Output/hg18.rmskRM327.imr90.meth"

Use bjobs to check the all jobs have completed and ls to check the files was in the "Output" folder.

ls -la ~/Output/hg18.rmskRM327.*.meth

# Console output

-rw------- 1 s00yao00 s00yao00 119358409 2014-12-20 19:57 /home/s00yao00/Output/hg18.rmskRM327.h1.meth
-rw------- 1 s00yao00 s00yao00 119358409 2014-12-20 19:57 /home/s00yao00/Output/hg18.rmskRM327.imr90.meth

Last updated