Quantification using RSEM
In this tutorial, we use RSEM to quantify the expression of genes and transcript. In the previous step, we instruct STAR to output genomic alignments in transcriptomic coordinates (i.e. Aligned.toTranscriptome.out.bam). We input this file to RSEM to produce gene and transcript expression levels.

Usage

1
rsem-calculate-expression [options] upstream_read_file(s) reference_name sample_name
2
rsem-calculate-expression [options] --paired-end upstream_read_file(s) downstream_read_file(s) reference_name sample_name
3
rsem-calculate-expression [options] --sam/--bam [--paired-end] input reference_name sample_name
Copied!

Execute

1
cd ~/LSLNGS2015/
2
mkdir RNASEQ_data/rsem_GM12878_rep1 RNASEQ_data/rsem_GM12878_rep2
3
4
/home/ycl6/tools/RSEM/rsem-calculate-expression --bam --no-bam-output -p 10 \
5
--paired-end --forward-prob 0 \
6
RNASEQ_data/star_GM12878_rep1/Aligned.toTranscriptome.out.bam \
7
GENOME_data/rsem/GRCh38 RNASEQ_data/rsem_GM12878_rep1/rsem >& \
8
RNASEQ_data/rsem_GM12878_rep1/rsem.log
9
10
/home/ycl6/tools/RSEM/rsem-calculate-expression --bam --no-bam-output -p 10 \
11
--paired-end --forward-prob 0 \
12
RNASEQ_data/star_GM12878_rep2/Aligned.toTranscriptome.out.bam \
13
GENOME_data/rsem/GRCh38 RNASEQ_data/rsem_GM12878_rep2/rsem >& \
14
RNASEQ_data/rsem_GM12878_rep2/rsem.log
15
16
mkdir RNASEQ_data/rsem_K562_rep1 RNASEQ_data/rsem_K562_rep2
17
18
/home/ycl6/tools/RSEM/rsem-calculate-expression --bam --no-bam-output -p 10 \
19
--paired-end --forward-prob 0 \
20
RNASEQ_data/star_K562_rep1/Aligned.toTranscriptome.out.bam \
21
GENOME_data/rsem/GRCh38 RNASEQ_data/rsem_K562_rep1/rsem >& \
22
RNASEQ_data/rsem_K562_rep1/rsem.log
23
24
/home/ycl6/tools/RSEM/rsem-calculate-expression --bam --no-bam-output -p 10 \
25
--paired-end --forward-prob 0 \
26
RNASEQ_data/star_K562_rep2/Aligned.toTranscriptome.out.bam \
27
GENOME_data/rsem/GRCh38 RNASEQ_data/rsem_K562_rep2/rsem >& \
28
RNASEQ_data/rsem_K562_rep2/rsem.log
Copied!

Resource usage

Sample
ALPS Queue Name
CPU Time
Max Memory
Duration
GM12878
48G
220573.45 sec.
6 GB
6 hours 21 minutes 55 seconds
K562
48G
220413.08 sec.
7 GB
6 hours 41 minutes 16 seconds

Options

--bam Input file is in BAM format.
--no-bam-output Do not output any BAM file.
-p Number of threads to use.
--paired-end Input reads are paired-end reads.
--forward-prob Probability of generating a read from the forward strand of a transcript. 1: strand-specific protocol where all (upstream) reads are derived from the forward strand; 0: strand-specific protocol where all (upstream) read are derived from the reverse strand; 0.5: non-strand-specific protocol.

Output

RSEM generates 2 result files: 1. rsem.genes.results 2. rsem.isoforms.results.
We can use head to examine these files.
head ~/LSLNGS2015/RNASEQ_data/rsem_GM12878_rep1/rsem.genes.results
1
gene_id transcript_id(s) length effective_length expected_count TPM FPKM
2
ENSG00000000003 ENST00000373020,ENST00000494424,ENST00000496771,ENST00000612152,ENST00000614008 2206.00 1997.20 1.00 0.00 0.01
3
ENSG00000000005 ENST00000373031,ENST00000485971 940.50 731.73 0.00 0.00 0.00
4
ENSG00000000419 ENST00000371582,ENST00000371584,ENST00000371588,ENST00000413082,ENST00000466152,ENST00000494752 977.15 768.35 1865.00 14.27 37.82
5
ENSG00000000457 ENST00000367770,ENST00000367771,ENST00000367772,ENST00000423670,ENST00000470238 3779.11 3570.31 1521.00 2.50 6.64
6
ENSG00000000460 ENST00000286031,ENST00000359326,ENST00000413811,ENST00000459772,ENST00000466580,ENST00000472795,ENST00000481744,ENST00000496973,ENST00000498289 1936.74 1727.94 1860.00 6.33 16.77
7
ENSG00000000938 ENST00000374003,ENST00000374004,ENST00000374005,ENST00000399173,ENST00000457296,ENST00000468038,ENST00000475472 2020.10 1811.30 6846.00 22.22 58.90
8
ENSG00000000971 ENST00000359637,ENST00000367429,ENST00000466229,ENST00000470918,ENST00000496761,ENST00000630130 2587.83 2379.04 0.00 0.00 0.00
9
ENSG00000001036 ENST00000002165,ENST00000367585,ENST00000451668 1912.64 1703.85 1358.00 4.69 12.42
10
ENSG00000001084 ENST00000229416,ENST00000504353,ENST00000504525,ENST00000505197,ENST00000505294,ENST00000509541,ENST00000510837,ENST00000513939,ENST00000514004,ENST00000514373,ENST00000514933,ENST00000515580,ENST00000616923 2333.50 2124.73 1178.00 3.26 8.64
Copied!
head ~/LSLNGS2015/RNASEQ_data/rsem_GM12878_rep1/rsem.isoforms.results
1
transcript_id gene_id length effective_length expected_count TPM FPKM IsoPct
2
ENST00000373020 ENSG00000000003 2206 1997.20 1.00 0.00 0.01 100.00
3
ENST00000494424 ENSG00000000003 820 611.20 0.00 0.00 0.00 0.00
4
ENST00000496771 ENSG00000000003 1025 816.20 0.00 0.00 0.00 0.00
5
ENST00000612152 ENSG00000000003 3796 3587.20 0.00 0.00 0.00 0.00
6
ENST00000614008 ENSG00000000003 900 691.20 0.00 0.00 0.00 0.00
7
ENST00000373031 ENSG00000000005 1339 1130.20 0.00 0.00 0.00 0.00
8
ENST00000485971 ENSG00000000005 542 333.26 0.00 0.00 0.00 0.00
9
ENST00000371582 ENSG00000000419 1161 952.20 16.56 0.10 0.27 0.72
10
ENST00000371584 ENSG00000000419 1073 864.20 8.54 0.06 0.15 0.41
Copied!
Check number of lines in all result files with wc -l
wc -l RNASEQ_data/rsem_*/*results
1
58052 RNASEQ_data/rsem_GM12878_rep1/rsem.genes.results
2
198003 RNASEQ_data/rsem_GM12878_rep1/rsem.isoforms.results
3
58052 RNASEQ_data/rsem_GM12878_rep2/rsem.genes.results
4
198003 RNASEQ_data/rsem_GM12878_rep2/rsem.isoforms.results
5
58052 RNASEQ_data/rsem_K562_rep1/rsem.genes.results
6
198003 RNASEQ_data/rsem_K562_rep1/rsem.isoforms.results
7
58052 RNASEQ_data/rsem_K562_rep2/rsem.genes.results
8
198003 RNASEQ_data/rsem_K562_rep2/rsem.isoforms.results
Copied!

Prepare gene-level and transcript-level expression matrices

We use paste command to join the rsem.genes.results files side-by-side, then use cut to select the columns containing the expected_count information, and place them into a final output file. Repeat the same step for isoforms.
This one-line command assumes the genes (and transcripts) in each files are in the same order. If they are not, you will have to sort the files before joining them together.
1
cd ~/LSLNGS2015/
2
3
paste RNASEQ_data/rsem_*/rsem.genes.results | tail -n+2 | \
4
cut -f1,5,12,19,26 > RNASEQ_data/edgeR.genes.rsem.txt
5
6
paste RNASEQ_data/rsem_*/rsem.isoforms.results | tail -n+2 | \
7
cut -f1,5,13,21,29 > RNASEQ_data/edgeR.isoforms.rsem.txt
Copied!
Let us look at the content of the created data matrices
Gene-level: head RNASEQ_data/edgeR.genes.rsem.txt
1
ENSG00000000003 1.00 0.00 3.00 2.00
2
ENSG00000000005 0.00 0.00 0.00 0.00
3
ENSG00000000419 1865.00 1951.00 5909.00 8163.00
4
ENSG00000000457 1521.00 1488.00 849.00 1400.00
5
ENSG00000000460 1860.00 1616.00 2577.00 2715.00
6
ENSG00000000938 6846.00 5298.00 1.00 2.00
7
ENSG00000000971 0.00 0.00 6159.00 7069.00
8
ENSG00000001036 1358.00 1186.00 6196.00 7009.00
9
ENSG00000001084 1178.00 1186.00 631.00 1293.00
10
ENSG00000001167 2247.00 1835.00 2887.00 3155.00
Copied!
Transcript-level: head RNASEQ_data/edgeR.isoforms.rsem.txt
1
ENST00000373020 1.00 0.00 2.88 2.00
2
ENST00000494424 0.00 0.00 0.12 0.00
3
ENST00000496771 0.00 0.00 0.00 0.00
4
ENST00000612152 0.00 0.00 0.00 0.00
5
ENST00000614008 0.00 0.00 0.00 0.00
6
ENST00000373031 0.00 0.00 0.00 0.00
7
ENST00000485971 0.00 0.00 0.00 0.00
8
ENST00000371582 16.56 53.45 530.00 620.89
9
ENST00000371584 8.54 44.25 234.93 368.14
10
ENST00000371588 1562.96 1853.30 5069.81 7081.71
Copied!
Last modified 1yr ago