Quantification using RSEM
Like the previous exercise, we can use RSEM to estimate the expression levels of the re-constructed transcripts under the four conditions: logarithmic growth, plateau phase, heat shock and diauxic shift. First, we align the RNA-Seq reads to the Trinity transcripts using Bowtie. Then we run RSEM to estimate the number of reads mapped to each transcript. We do not need a splice-aware aligner (such as STAR) in this case because we are mapping the reads to cDNAs instead of a genomic sequence. Also the gap-free alignment produced by Bowtie is used as input for RSEM.

Execute

Locate util/align_and_estimate_abundance.pl in the trinityrnaseq-2.2.0 distribution, and run
1
cd ~/LSLNGS2015/Trinity
2
3
bsub -q 4G -o ./RSEM_Sp_ds.std -e ./RSEM_Sp_ds.err -J RSEM_Sp_ds \
4
"PATH_TO_TRINITY/util/align_and_estimate_abundance.pl --seqType fq \
5
--left RNASEQ_data/Sp_ds.left.fq.gz --right RNASEQ_data/Sp_ds.right.fq.gz \
6
--transcripts trinity_reference/Trinity.fasta \
7
--output_prefix Sp_ds --est_method RSEM --aln_method bowtie \
8
--trinity_mode --prep_reference --output_dir RSEM_Sp_ds"
9
10
bsub -q 4G -o ./RSEM_Sp_hs.std -e ./RSEM_Sp_hs.err -J RSEM_Sp_hs \
11
"PATH_TO_TRINITY/util/align_and_estimate_abundance.pl --seqType fq \
12
--left RNASEQ_data/Sp_hs.left.fq.gz --right RNASEQ_data/Sp_hs.right.fq.gz \
13
--transcripts trinity_reference/Trinity.fasta \
14
--output_prefix Sp_hs --est_method RSEM --aln_method bowtie \
15
--trinity_mode --prep_reference --output_dir RSEM_Sp_hs"
16
17
bsub -q 4G -o ./RSEM_Sp_log.std -e ./RSEM_Sp_log.err -J RSEM_Sp_log \
18
"PATH_TO_TRINITY/util/align_and_estimate_abundance.pl --seqType fq \
19
--left RNASEQ_data/Sp_log.left.fq.gz --right RNASEQ_data/Sp_log.right.fq.gz \
20
--transcripts trinity_reference/Trinity.fasta \
21
--output_prefix Sp_log --est_method RSEM --aln_method bowtie \
22
--trinity_mode --prep_reference --output_dir RSEM_Sp_log"
23
24
bsub -q 4G -o ./RSEM_Sp_plat.std -e ./RSEM_Sp_plat.err -J RSEM_Sp_plat \
25
"PATH_TO_TRINITY/util/align_and_estimate_abundance.pl --seqType fq \
26
--left RNASEQ_data/Sp_plat.left.fq.gz --right RNASEQ_data/Sp_plat.right.fq.gz \
27
--transcripts trinity_reference/Trinity.fasta \
28
--output_prefix Sp_plat --est_method RSEM --aln_method bowtie \
29
--trinity_mode --prep_reference --output_dir RSEM_Sp_plat"
Copied!
Get status with bjobs
1
JOBID USER STAT QUEUE FROM_HOST EXEC_HOST JOB_NAME SUBMIT_TIME
2
993742 s00ycm0 RUN 4G alps1 2*alps1-40 RSEM_Sp_ds Nov 18 23:12
3
993743 s00ycm0 RUN 4G alps1 2*alps1-41 RSEM_Sp_hs Nov 18 23:12
4
993744 s00ycm0 RUN 4G alps1 2*alps1-42 *EM_Sp_log Nov 18 23:12
5
993745 s00ycm0 RUN 4G alps1 2*alps1-43 *M_Sp_plat Nov 18 23:12
Copied!

Resource usage

Job
ALPS Queue Name
CPU Time
Max Memory
Duration
RSEM_Sp_ds
4G
42.28 sec.
-
32 seconds
RSEM_Sp_hs
4G
36.40 sec.
-
26 seconds
RSEM_Sp_log
4G
14.28 sec.
-
13 seconds
RSEM_Sp_plat
4G
48.40 sec.
-
38 seconds

Estimations

Once the jobs are completed, we will find *.isoforms.results and *.genes.results in the output folders. These files contain the expected counts and normalized expression values of the Trinity transcripts (isoforms) and components (genes).
ls -la RSEM_Sp_*/*results
1
-rw-rw-r-- 1 ycl6 ycl6 28893 Oct 27 12:42 RSEM_Sp_ds/Sp_ds.genes.results
2
-rw-rw-r-- 1 ycl6 ycl6 30731 Oct 27 12:42 RSEM_Sp_ds/Sp_ds.isoforms.results
3
-rw-rw-r-- 1 ycl6 ycl6 28579 Oct 27 12:42 RSEM_Sp_hs/Sp_hs.genes.results
4
-rw-rw-r-- 1 ycl6 ycl6 30369 Oct 27 12:42 RSEM_Sp_hs/Sp_hs.isoforms.results
5
-rw-rw-r-- 1 ycl6 ycl6 28928 Oct 27 12:42 RSEM_Sp_log/Sp_log.genes.results
6
-rw-rw-r-- 1 ycl6 ycl6 30754 Oct 27 12:42 RSEM_Sp_log/Sp_log.isoforms.results
7
-rw-rw-r-- 1 ycl6 ycl6 28866 Oct 27 12:42 RSEM_Sp_plat/Sp_plat.genes.results
8
-rw-rw-r-- 1 ycl6 ycl6 30685 Oct 27 12:42 RSEM_Sp_plat/Sp_plat.isoforms.results
Copied!
We can use head to examine these files. Your values may not be the same because the assembly results are not deterministic.
head RSEM_Sp_ds/Sp_ds.genes.results
1
gene_id transcript_id(s) length effective_length expected_count TPM FPKM
2
TRINITY_DN0_c0_g1 TRINITY_DN0_c0_g1_i1 1420.00 1155.49 34.00 333.15 308.72
3
TRINITY_DN100_c0_g1 TRINITY_DN100_c0_g1_i1 289.00 39.76 21.72 6183.87 5730.39
4
TRINITY_DN101_c0_g1 TRINITY_DN101_c0_g1_i1 1695.00 1430.49 24.00 189.96 176.03
5
TRINITY_DN102_c0_g1 TRINITY_DN102_c0_g1_i1 3075.00 2810.49 110.00 443.14 410.65
6
TRINITY_DN103_c0_g1 TRINITY_DN103_c0_g1_i1 251.00 17.89 0.00 0.00 0.00
7
TRINITY_DN104_c0_g1 TRINITY_DN104_c0_g1_i1 3373.00 3108.49 106.00 386.09 357.78
8
TRINITY_DN105_c0_g1 TRINITY_DN105_c0_g1_i1 784.00 519.49 34.00 741.02 686.68
9
TRINITY_DN106_c0_g1 TRINITY_DN106_c0_g1_i1 1741.00 1476.49 83.87 643.14 595.98
10
TRINITY_DN107_c0_g1 TRINITY_DN107_c0_g1_i1 2416.00 2151.49 60.00 315.75 292.60
Copied!
head RSEM_Sp_ds/Sp_ds.isoforms.results
1
transcript_id gene_id length effective_length expected_count TPM FPKM IsoPct
2
TRINITY_DN0_c0_g1_i1 TRINITY_DN0_c0_g1 1420 1155.49 34.00 333.15 308.72 100.00
3
TRINITY_DN100_c0_g1_i1 TRINITY_DN100_c0_g1 289 39.76 21.72 6183.87 5730.39 100.00
4
TRINITY_DN101_c0_g1_i1 TRINITY_DN101_c0_g1 1695 1430.49 24.00 189.96 176.03 100.00
5
TRINITY_DN102_c0_g1_i1 TRINITY_DN102_c0_g1 3075 2810.49 110.00 443.14 410.65 100.00
6
TRINITY_DN103_c0_g1_i1 TRINITY_DN103_c0_g1 251 17.89 0.00 0.00 0.00 0.00
7
TRINITY_DN104_c0_g1_i1 TRINITY_DN104_c0_g1 3373 3108.49 106.00 386.09 357.78 100.00
8
TRINITY_DN105_c0_g1_i1 TRINITY_DN105_c0_g1 784 519.49 34.00 741.02 686.68 100.00
9
TRINITY_DN106_c0_g1_i1 TRINITY_DN106_c0_g1 1741 1476.49 83.87 643.14 595.98 100.00
10
TRINITY_DN107_c0_g1_i1 TRINITY_DN107_c0_g1 2416 2151.49 60.00 315.75 292.60 100.00
Copied!
Last modified 1yr ago