Quantification using RSEM

In this tutorial, we use RSEM to quantify the expression of genes and transcript. In the previous step, we instruct STAR to output genomic alignments in transcriptomic coordinates (i.e. Aligned.toTranscriptome.out.bam). We input this file to RSEM to produce gene and transcript expression levels.

Usage

rsem-calculate-expression [options] upstream_read_file(s) reference_name sample_name
rsem-calculate-expression [options] --paired-end upstream_read_file(s) downstream_read_file(s) reference_name sample_name
rsem-calculate-expression [options] --sam/--bam [--paired-end] input reference_name sample_name

Execute

cd ~/LSLNGS2015/
mkdir RNASEQ_data/rsem_GM12878_rep1 RNASEQ_data/rsem_GM12878_rep2

/home/ycl6/tools/RSEM/rsem-calculate-expression --bam --no-bam-output -p 10 \
--paired-end --forward-prob 0 \
RNASEQ_data/star_GM12878_rep1/Aligned.toTranscriptome.out.bam \
GENOME_data/rsem/GRCh38 RNASEQ_data/rsem_GM12878_rep1/rsem >& \
RNASEQ_data/rsem_GM12878_rep1/rsem.log

/home/ycl6/tools/RSEM/rsem-calculate-expression --bam --no-bam-output -p 10 \
--paired-end --forward-prob 0 \
RNASEQ_data/star_GM12878_rep2/Aligned.toTranscriptome.out.bam \
GENOME_data/rsem/GRCh38 RNASEQ_data/rsem_GM12878_rep2/rsem >& \
RNASEQ_data/rsem_GM12878_rep2/rsem.log

mkdir RNASEQ_data/rsem_K562_rep1 RNASEQ_data/rsem_K562_rep2

/home/ycl6/tools/RSEM/rsem-calculate-expression --bam --no-bam-output -p 10 \
--paired-end --forward-prob 0 \
RNASEQ_data/star_K562_rep1/Aligned.toTranscriptome.out.bam \
GENOME_data/rsem/GRCh38 RNASEQ_data/rsem_K562_rep1/rsem >& \
RNASEQ_data/rsem_K562_rep1/rsem.log

/home/ycl6/tools/RSEM/rsem-calculate-expression --bam --no-bam-output -p 10 \
--paired-end --forward-prob 0 \
RNASEQ_data/star_K562_rep2/Aligned.toTranscriptome.out.bam \
GENOME_data/rsem/GRCh38 RNASEQ_data/rsem_K562_rep2/rsem >& \
RNASEQ_data/rsem_K562_rep2/rsem.log

Resource usage

Sample

ALPS Queue Name

CPU Time

Max Memory

Duration

GM12878

48G

220573.45 sec.

6 GB

6 hours 21 minutes 55 seconds

K562

48G

220413.08 sec.

7 GB

6 hours 41 minutes 16 seconds

Options

--bam Input file is in BAM format.

--no-bam-output Do not output any BAM file.

-p Number of threads to use.

--paired-end Input reads are paired-end reads.

--forward-prob Probability of generating a read from the forward strand of a transcript. 1: strand-specific protocol where all (upstream) reads are derived from the forward strand; 0: strand-specific protocol where all (upstream) read are derived from the reverse strand; 0.5: non-strand-specific protocol.

Output

RSEM generates 2 result files: 1. rsem.genes.results 2. rsem.isoforms.results.

We can use head to examine these files.

head ~/LSLNGS2015/RNASEQ_data/rsem_GM12878_rep1/rsem.genes.results

head ~/LSLNGS2015/RNASEQ_data/rsem_GM12878_rep1/rsem.isoforms.results

Check number of lines in all result files with wc -l

wc -l RNASEQ_data/rsem_*/*results

Prepare gene-level and transcript-level expression matrices

We use paste command to join the rsem.genes.results files side-by-side, then use cut to select the columns containing the expected_count information, and place them into a final output file. Repeat the same step for isoforms.

This one-line command assumes the genes (and transcripts) in each files are in the same order. If they are not, you will have to sort the files before joining them together.

Let us look at the content of the created data matrices

Gene-level: head RNASEQ_data/edgeR.genes.rsem.txt

Transcript-level: head RNASEQ_data/edgeR.isoforms.rsem.txt

Last updated

Was this helpful?