Quantification using RSEM
In this tutorial, we use RSEM to quantify the expression of genes and transcript. In the previous step, we instruct STAR to output genomic alignments in transcriptomic coordinates (i.e. Aligned.toTranscriptome.out.bam). We input this file to RSEM to produce gene and transcript expression levels.
Usage
rsem-calculate-expression [options] upstream_read_file(s) reference_name sample_name
rsem-calculate-expression [options] --paired-end upstream_read_file(s) downstream_read_file(s) reference_name sample_name
rsem-calculate-expression [options] --sam/--bam [--paired-end] input reference_name sample_nameExecute
cd ~/LSLNGS2015/
mkdir RNASEQ_data/rsem_GM12878_rep1 RNASEQ_data/rsem_GM12878_rep2
/home/ycl6/tools/RSEM/rsem-calculate-expression --bam --no-bam-output -p 10 \
--paired-end --forward-prob 0 \
RNASEQ_data/star_GM12878_rep1/Aligned.toTranscriptome.out.bam \
GENOME_data/rsem/GRCh38 RNASEQ_data/rsem_GM12878_rep1/rsem >& \
RNASEQ_data/rsem_GM12878_rep1/rsem.log
/home/ycl6/tools/RSEM/rsem-calculate-expression --bam --no-bam-output -p 10 \
--paired-end --forward-prob 0 \
RNASEQ_data/star_GM12878_rep2/Aligned.toTranscriptome.out.bam \
GENOME_data/rsem/GRCh38 RNASEQ_data/rsem_GM12878_rep2/rsem >& \
RNASEQ_data/rsem_GM12878_rep2/rsem.log
mkdir RNASEQ_data/rsem_K562_rep1 RNASEQ_data/rsem_K562_rep2
/home/ycl6/tools/RSEM/rsem-calculate-expression --bam --no-bam-output -p 10 \
--paired-end --forward-prob 0 \
RNASEQ_data/star_K562_rep1/Aligned.toTranscriptome.out.bam \
GENOME_data/rsem/GRCh38 RNASEQ_data/rsem_K562_rep1/rsem >& \
RNASEQ_data/rsem_K562_rep1/rsem.log
/home/ycl6/tools/RSEM/rsem-calculate-expression --bam --no-bam-output -p 10 \
--paired-end --forward-prob 0 \
RNASEQ_data/star_K562_rep2/Aligned.toTranscriptome.out.bam \
GENOME_data/rsem/GRCh38 RNASEQ_data/rsem_K562_rep2/rsem >& \
RNASEQ_data/rsem_K562_rep2/rsem.logResource usage
Sample
ALPS Queue Name
CPU Time
Max Memory
Duration
GM12878
48G
220573.45 sec.
6 GB
6 hours 21 minutes 55 seconds
K562
48G
220413.08 sec.
7 GB
6 hours 41 minutes 16 seconds
Options
--bam Input file is in BAM format.
--no-bam-output Do not output any BAM file.
-p Number of threads to use.
--paired-end Input reads are paired-end reads.
--forward-prob Probability of generating a read from the forward strand of a transcript. 1: strand-specific protocol where all (upstream) reads are derived from the forward strand; 0: strand-specific protocol where all (upstream) read are derived from the reverse strand; 0.5: non-strand-specific protocol.
Output
RSEM generates 2 result files: 1. rsem.genes.results 2. rsem.isoforms.results.
We can use head to examine these files.
head ~/LSLNGS2015/RNASEQ_data/rsem_GM12878_rep1/rsem.genes.results
head ~/LSLNGS2015/RNASEQ_data/rsem_GM12878_rep1/rsem.isoforms.results
Check number of lines in all result files with wc -l
wc -l RNASEQ_data/rsem_*/*results
Prepare gene-level and transcript-level expression matrices
We use paste command to join the rsem.genes.results files side-by-side, then use cut to select the columns containing the expected_count information, and place them into a final output file. Repeat the same step for isoforms.
This one-line command assumes the genes (and transcripts) in each files are in the same order. If they are not, you will have to sort the files before joining them together.
Let us look at the content of the created data matrices
Gene-level: head RNASEQ_data/edgeR.genes.rsem.txt
Transcript-level: head RNASEQ_data/edgeR.isoforms.rsem.txt
Last updated
Was this helpful?