Quantification using RSEM
Like the previous exercise, we can use RSEM to estimate the expression levels of the re-constructed transcripts under the four conditions: logarithmic growth, plateau phase, heat shock and diauxic shift. First, we align the RNA-Seq reads to the Trinity transcripts using Bowtie. Then we run RSEM to estimate the number of reads mapped to each transcript. We do not need a splice-aware aligner (such as STAR) in this case because we are mapping the reads to cDNAs instead of a genomic sequence. Also the gap-free alignment produced by Bowtie is used as input for RSEM.
Execute
Locate util/align_and_estimate_abundance.pl
in the trinityrnaseq-2.2.0 distribution, and run
cd ~/LSLNGS2015/Trinity
bsub -q 4G -o ./RSEM_Sp_ds.std -e ./RSEM_Sp_ds.err -J RSEM_Sp_ds \
"PATH_TO_TRINITY/util/align_and_estimate_abundance.pl --seqType fq \
--left RNASEQ_data/Sp_ds.left.fq.gz --right RNASEQ_data/Sp_ds.right.fq.gz \
--transcripts trinity_reference/Trinity.fasta \
--output_prefix Sp_ds --est_method RSEM --aln_method bowtie \
--trinity_mode --prep_reference --output_dir RSEM_Sp_ds"
bsub -q 4G -o ./RSEM_Sp_hs.std -e ./RSEM_Sp_hs.err -J RSEM_Sp_hs \
"PATH_TO_TRINITY/util/align_and_estimate_abundance.pl --seqType fq \
--left RNASEQ_data/Sp_hs.left.fq.gz --right RNASEQ_data/Sp_hs.right.fq.gz \
--transcripts trinity_reference/Trinity.fasta \
--output_prefix Sp_hs --est_method RSEM --aln_method bowtie \
--trinity_mode --prep_reference --output_dir RSEM_Sp_hs"
bsub -q 4G -o ./RSEM_Sp_log.std -e ./RSEM_Sp_log.err -J RSEM_Sp_log \
"PATH_TO_TRINITY/util/align_and_estimate_abundance.pl --seqType fq \
--left RNASEQ_data/Sp_log.left.fq.gz --right RNASEQ_data/Sp_log.right.fq.gz \
--transcripts trinity_reference/Trinity.fasta \
--output_prefix Sp_log --est_method RSEM --aln_method bowtie \
--trinity_mode --prep_reference --output_dir RSEM_Sp_log"
bsub -q 4G -o ./RSEM_Sp_plat.std -e ./RSEM_Sp_plat.err -J RSEM_Sp_plat \
"PATH_TO_TRINITY/util/align_and_estimate_abundance.pl --seqType fq \
--left RNASEQ_data/Sp_plat.left.fq.gz --right RNASEQ_data/Sp_plat.right.fq.gz \
--transcripts trinity_reference/Trinity.fasta \
--output_prefix Sp_plat --est_method RSEM --aln_method bowtie \
--trinity_mode --prep_reference --output_dir RSEM_Sp_plat"
Get status with bjobs
JOBID USER STAT QUEUE FROM_HOST EXEC_HOST JOB_NAME SUBMIT_TIME
993742 s00ycm0 RUN 4G alps1 2*alps1-40 RSEM_Sp_ds Nov 18 23:12
993743 s00ycm0 RUN 4G alps1 2*alps1-41 RSEM_Sp_hs Nov 18 23:12
993744 s00ycm0 RUN 4G alps1 2*alps1-42 *EM_Sp_log Nov 18 23:12
993745 s00ycm0 RUN 4G alps1 2*alps1-43 *M_Sp_plat Nov 18 23:12
Resource usage
Job
ALPS Queue Name
CPU Time
Max Memory
Duration
RSEM_Sp_ds
4G
42.28 sec.
-
32 seconds
RSEM_Sp_hs
4G
36.40 sec.
-
26 seconds
RSEM_Sp_log
4G
14.28 sec.
-
13 seconds
RSEM_Sp_plat
4G
48.40 sec.
-
38 seconds
Estimations
Once the jobs are completed, we will find *.isoforms.results
and *.genes.results
in the output folders. These files contain the expected counts and normalized expression values of the Trinity transcripts (isoforms) and components (genes).
ls -la RSEM_Sp_*/*results
-rw-rw-r-- 1 ycl6 ycl6 28893 Oct 27 12:42 RSEM_Sp_ds/Sp_ds.genes.results
-rw-rw-r-- 1 ycl6 ycl6 30731 Oct 27 12:42 RSEM_Sp_ds/Sp_ds.isoforms.results
-rw-rw-r-- 1 ycl6 ycl6 28579 Oct 27 12:42 RSEM_Sp_hs/Sp_hs.genes.results
-rw-rw-r-- 1 ycl6 ycl6 30369 Oct 27 12:42 RSEM_Sp_hs/Sp_hs.isoforms.results
-rw-rw-r-- 1 ycl6 ycl6 28928 Oct 27 12:42 RSEM_Sp_log/Sp_log.genes.results
-rw-rw-r-- 1 ycl6 ycl6 30754 Oct 27 12:42 RSEM_Sp_log/Sp_log.isoforms.results
-rw-rw-r-- 1 ycl6 ycl6 28866 Oct 27 12:42 RSEM_Sp_plat/Sp_plat.genes.results
-rw-rw-r-- 1 ycl6 ycl6 30685 Oct 27 12:42 RSEM_Sp_plat/Sp_plat.isoforms.results
We can use head
to examine these files. Your values may not be the same because the assembly results are not deterministic.
head RSEM_Sp_ds/Sp_ds.genes.results
gene_id transcript_id(s) length effective_length expected_count TPM FPKM
TRINITY_DN0_c0_g1 TRINITY_DN0_c0_g1_i1 1420.00 1155.49 34.00 333.15 308.72
TRINITY_DN100_c0_g1 TRINITY_DN100_c0_g1_i1 289.00 39.76 21.72 6183.87 5730.39
TRINITY_DN101_c0_g1 TRINITY_DN101_c0_g1_i1 1695.00 1430.49 24.00 189.96 176.03
TRINITY_DN102_c0_g1 TRINITY_DN102_c0_g1_i1 3075.00 2810.49 110.00 443.14 410.65
TRINITY_DN103_c0_g1 TRINITY_DN103_c0_g1_i1 251.00 17.89 0.00 0.00 0.00
TRINITY_DN104_c0_g1 TRINITY_DN104_c0_g1_i1 3373.00 3108.49 106.00 386.09 357.78
TRINITY_DN105_c0_g1 TRINITY_DN105_c0_g1_i1 784.00 519.49 34.00 741.02 686.68
TRINITY_DN106_c0_g1 TRINITY_DN106_c0_g1_i1 1741.00 1476.49 83.87 643.14 595.98
TRINITY_DN107_c0_g1 TRINITY_DN107_c0_g1_i1 2416.00 2151.49 60.00 315.75 292.60
head RSEM_Sp_ds/Sp_ds.isoforms.results
transcript_id gene_id length effective_length expected_count TPM FPKM IsoPct
TRINITY_DN0_c0_g1_i1 TRINITY_DN0_c0_g1 1420 1155.49 34.00 333.15 308.72 100.00
TRINITY_DN100_c0_g1_i1 TRINITY_DN100_c0_g1 289 39.76 21.72 6183.87 5730.39 100.00
TRINITY_DN101_c0_g1_i1 TRINITY_DN101_c0_g1 1695 1430.49 24.00 189.96 176.03 100.00
TRINITY_DN102_c0_g1_i1 TRINITY_DN102_c0_g1 3075 2810.49 110.00 443.14 410.65 100.00
TRINITY_DN103_c0_g1_i1 TRINITY_DN103_c0_g1 251 17.89 0.00 0.00 0.00 0.00
TRINITY_DN104_c0_g1_i1 TRINITY_DN104_c0_g1 3373 3108.49 106.00 386.09 357.78 100.00
TRINITY_DN105_c0_g1_i1 TRINITY_DN105_c0_g1 784 519.49 34.00 741.02 686.68 100.00
TRINITY_DN106_c0_g1_i1 TRINITY_DN106_c0_g1 1741 1476.49 83.87 643.14 595.98 100.00
TRINITY_DN107_c0_g1_i1 TRINITY_DN107_c0_g1 2416 2151.49 60.00 315.75 292.60 100.00
Last updated
Was this helpful?