Quantification using RSEM

Like the previous exercise, we can use RSEM to estimate the expression levels of the re-constructed transcripts under the four conditions: logarithmic growth, plateau phase, heat shock and diauxic shift. First, we align the RNA-Seq reads to the Trinity transcripts using Bowtie. Then we run RSEM to estimate the number of reads mapped to each transcript. We do not need a splice-aware aligner (such as STAR) in this case because we are mapping the reads to cDNAs instead of a genomic sequence. Also the gap-free alignment produced by Bowtie is used as input for RSEM.

Execute

Locate util/align_and_estimate_abundance.pl in the trinityrnaseq-2.2.0 distribution, and run

cd ~/LSLNGS2015/Trinity

bsub -q 4G -o ./RSEM_Sp_ds.std -e ./RSEM_Sp_ds.err -J RSEM_Sp_ds \
"PATH_TO_TRINITY/util/align_and_estimate_abundance.pl --seqType fq  \
--left RNASEQ_data/Sp_ds.left.fq.gz --right RNASEQ_data/Sp_ds.right.fq.gz \
--transcripts trinity_reference/Trinity.fasta \
--output_prefix Sp_ds --est_method RSEM --aln_method bowtie \
--trinity_mode --prep_reference --output_dir RSEM_Sp_ds"

bsub -q 4G -o ./RSEM_Sp_hs.std -e ./RSEM_Sp_hs.err -J RSEM_Sp_hs \
"PATH_TO_TRINITY/util/align_and_estimate_abundance.pl --seqType fq  \
--left RNASEQ_data/Sp_hs.left.fq.gz --right RNASEQ_data/Sp_hs.right.fq.gz \
--transcripts trinity_reference/Trinity.fasta \
--output_prefix Sp_hs --est_method RSEM --aln_method bowtie \
--trinity_mode --prep_reference --output_dir RSEM_Sp_hs"

bsub -q 4G -o ./RSEM_Sp_log.std -e ./RSEM_Sp_log.err -J RSEM_Sp_log \
"PATH_TO_TRINITY/util/align_and_estimate_abundance.pl --seqType fq  \
--left RNASEQ_data/Sp_log.left.fq.gz --right RNASEQ_data/Sp_log.right.fq.gz \
--transcripts trinity_reference/Trinity.fasta \
--output_prefix Sp_log --est_method RSEM --aln_method bowtie \
--trinity_mode --prep_reference --output_dir RSEM_Sp_log"

bsub -q 4G -o ./RSEM_Sp_plat.std -e ./RSEM_Sp_plat.err -J RSEM_Sp_plat \
"PATH_TO_TRINITY/util/align_and_estimate_abundance.pl --seqType fq  \
--left RNASEQ_data/Sp_plat.left.fq.gz --right RNASEQ_data/Sp_plat.right.fq.gz \
--transcripts trinity_reference/Trinity.fasta \
--output_prefix Sp_plat --est_method RSEM --aln_method bowtie \
--trinity_mode --prep_reference --output_dir RSEM_Sp_plat"

Get status with bjobs

JOBID   USER    STAT  QUEUE      FROM_HOST   EXEC_HOST   JOB_NAME   SUBMIT_TIME
993742  s00ycm0 RUN   4G         alps1       2*alps1-40  RSEM_Sp_ds Nov 18 23:12
993743  s00ycm0 RUN   4G         alps1       2*alps1-41  RSEM_Sp_hs Nov 18 23:12
993744  s00ycm0 RUN   4G         alps1       2*alps1-42  *EM_Sp_log Nov 18 23:12
993745  s00ycm0 RUN   4G         alps1       2*alps1-43  *M_Sp_plat Nov 18 23:12

Resource usage

Estimations

Once the jobs are completed, we will find *.isoforms.results and *.genes.results in the output folders. These files contain the expected counts and normalized expression values of the Trinity transcripts (isoforms) and components (genes).

ls -la RSEM_Sp_*/*results

-rw-rw-r-- 1 ycl6 ycl6 28893 Oct 27 12:42 RSEM_Sp_ds/Sp_ds.genes.results
-rw-rw-r-- 1 ycl6 ycl6 30731 Oct 27 12:42 RSEM_Sp_ds/Sp_ds.isoforms.results
-rw-rw-r-- 1 ycl6 ycl6 28579 Oct 27 12:42 RSEM_Sp_hs/Sp_hs.genes.results
-rw-rw-r-- 1 ycl6 ycl6 30369 Oct 27 12:42 RSEM_Sp_hs/Sp_hs.isoforms.results
-rw-rw-r-- 1 ycl6 ycl6 28928 Oct 27 12:42 RSEM_Sp_log/Sp_log.genes.results
-rw-rw-r-- 1 ycl6 ycl6 30754 Oct 27 12:42 RSEM_Sp_log/Sp_log.isoforms.results
-rw-rw-r-- 1 ycl6 ycl6 28866 Oct 27 12:42 RSEM_Sp_plat/Sp_plat.genes.results
-rw-rw-r-- 1 ycl6 ycl6 30685 Oct 27 12:42 RSEM_Sp_plat/Sp_plat.isoforms.results

We can use head to examine these files. Your values may not be the same because the assembly results are not deterministic.

head RSEM_Sp_ds/Sp_ds.genes.results

gene_id transcript_id(s)        length  effective_length        expected_count  TPM     FPKM
TRINITY_DN0_c0_g1       TRINITY_DN0_c0_g1_i1    1420.00 1155.49 34.00   333.15  308.72
TRINITY_DN100_c0_g1     TRINITY_DN100_c0_g1_i1  289.00  39.76   21.72   6183.87 5730.39
TRINITY_DN101_c0_g1     TRINITY_DN101_c0_g1_i1  1695.00 1430.49 24.00   189.96  176.03
TRINITY_DN102_c0_g1     TRINITY_DN102_c0_g1_i1  3075.00 2810.49 110.00  443.14  410.65
TRINITY_DN103_c0_g1     TRINITY_DN103_c0_g1_i1  251.00  17.89   0.00    0.00    0.00
TRINITY_DN104_c0_g1     TRINITY_DN104_c0_g1_i1  3373.00 3108.49 106.00  386.09  357.78
TRINITY_DN105_c0_g1     TRINITY_DN105_c0_g1_i1  784.00  519.49  34.00   741.02  686.68
TRINITY_DN106_c0_g1     TRINITY_DN106_c0_g1_i1  1741.00 1476.49 83.87   643.14  595.98
TRINITY_DN107_c0_g1     TRINITY_DN107_c0_g1_i1  2416.00 2151.49 60.00   315.75  292.60

head RSEM_Sp_ds/Sp_ds.isoforms.results

transcript_id   gene_id length  effective_length        expected_count  TPM     FPKM    IsoPct
TRINITY_DN0_c0_g1_i1    TRINITY_DN0_c0_g1       1420    1155.49 34.00   333.15  308.72  100.00
TRINITY_DN100_c0_g1_i1  TRINITY_DN100_c0_g1     289     39.76   21.72   6183.87 5730.39 100.00
TRINITY_DN101_c0_g1_i1  TRINITY_DN101_c0_g1     1695    1430.49 24.00   189.96  176.03  100.00
TRINITY_DN102_c0_g1_i1  TRINITY_DN102_c0_g1     3075    2810.49 110.00  443.14  410.65  100.00
TRINITY_DN103_c0_g1_i1  TRINITY_DN103_c0_g1     251     17.89   0.00    0.00    0.00    0.00
TRINITY_DN104_c0_g1_i1  TRINITY_DN104_c0_g1     3373    3108.49 106.00  386.09  357.78  100.00
TRINITY_DN105_c0_g1_i1  TRINITY_DN105_c0_g1     784     519.49  34.00   741.02  686.68  100.00
TRINITY_DN106_c0_g1_i1  TRINITY_DN106_c0_g1     1741    1476.49 83.87   643.14  595.98  100.00
TRINITY_DN107_c0_g1_i1  TRINITY_DN107_c0_g1     2416    2151.49 60.00   315.75  292.60  100.00

Last updated