# Quantification using RSEM

Like the previous exercise, we can use RSEM to estimate the expression levels of the re-constructed transcripts under the four conditions: logarithmic growth, plateau phase, heat shock and diauxic shift. First, we align the RNA-Seq reads to the Trinity transcripts using Bowtie. Then we run RSEM to estimate the number of reads mapped to each transcript. We do not need a splice-aware aligner (such as STAR) in this case because we are mapping the reads to cDNAs instead of a genomic sequence. Also the gap-free alignment produced by Bowtie is used as input for RSEM.

## Execute

Locate `util/align_and_estimate_abundance.pl` in the trinityrnaseq-2.2.0 distribution, and run

```
cd ~/LSLNGS2015/Trinity

bsub -q 4G -o ./RSEM_Sp_ds.std -e ./RSEM_Sp_ds.err -J RSEM_Sp_ds \
"PATH_TO_TRINITY/util/align_and_estimate_abundance.pl --seqType fq  \
--left RNASEQ_data/Sp_ds.left.fq.gz --right RNASEQ_data/Sp_ds.right.fq.gz \
--transcripts trinity_reference/Trinity.fasta \
--output_prefix Sp_ds --est_method RSEM --aln_method bowtie \
--trinity_mode --prep_reference --output_dir RSEM_Sp_ds"

bsub -q 4G -o ./RSEM_Sp_hs.std -e ./RSEM_Sp_hs.err -J RSEM_Sp_hs \
"PATH_TO_TRINITY/util/align_and_estimate_abundance.pl --seqType fq  \
--left RNASEQ_data/Sp_hs.left.fq.gz --right RNASEQ_data/Sp_hs.right.fq.gz \
--transcripts trinity_reference/Trinity.fasta \
--output_prefix Sp_hs --est_method RSEM --aln_method bowtie \
--trinity_mode --prep_reference --output_dir RSEM_Sp_hs"

bsub -q 4G -o ./RSEM_Sp_log.std -e ./RSEM_Sp_log.err -J RSEM_Sp_log \
"PATH_TO_TRINITY/util/align_and_estimate_abundance.pl --seqType fq  \
--left RNASEQ_data/Sp_log.left.fq.gz --right RNASEQ_data/Sp_log.right.fq.gz \
--transcripts trinity_reference/Trinity.fasta \
--output_prefix Sp_log --est_method RSEM --aln_method bowtie \
--trinity_mode --prep_reference --output_dir RSEM_Sp_log"

bsub -q 4G -o ./RSEM_Sp_plat.std -e ./RSEM_Sp_plat.err -J RSEM_Sp_plat \
"PATH_TO_TRINITY/util/align_and_estimate_abundance.pl --seqType fq  \
--left RNASEQ_data/Sp_plat.left.fq.gz --right RNASEQ_data/Sp_plat.right.fq.gz \
--transcripts trinity_reference/Trinity.fasta \
--output_prefix Sp_plat --est_method RSEM --aln_method bowtie \
--trinity_mode --prep_reference --output_dir RSEM_Sp_plat"
```

Get status with `bjobs`

```
JOBID   USER    STAT  QUEUE      FROM_HOST   EXEC_HOST   JOB_NAME   SUBMIT_TIME
993742  s00ycm0 RUN   4G         alps1       2*alps1-40  RSEM_Sp_ds Nov 18 23:12
993743  s00ycm0 RUN   4G         alps1       2*alps1-41  RSEM_Sp_hs Nov 18 23:12
993744  s00ycm0 RUN   4G         alps1       2*alps1-42  *EM_Sp_log Nov 18 23:12
993745  s00ycm0 RUN   4G         alps1       2*alps1-43  *M_Sp_plat Nov 18 23:12
```

## Resource usage

| Job            | ALPS Queue Name | CPU Time   | Max Memory | Duration   |
| -------------- | --------------- | ---------- | ---------- | ---------- |
| RSEM\_Sp\_ds   | 4G              | 42.28 sec. | -          | 32 seconds |
| RSEM\_Sp\_hs   | 4G              | 36.40 sec. | -          | 26 seconds |
| RSEM\_Sp\_log  | 4G              | 14.28 sec. | -          | 13 seconds |
| RSEM\_Sp\_plat | 4G              | 48.40 sec. | -          | 38 seconds |

## Estimations

Once the jobs are completed, we will find `*.isoforms.results` and `*.genes.results` in the output folders. These files contain the expected counts and normalized expression values of the Trinity transcripts (isoforms) and components (genes).

`ls -la RSEM_Sp_*/*results`

```
-rw-rw-r-- 1 ycl6 ycl6 28893 Oct 27 12:42 RSEM_Sp_ds/Sp_ds.genes.results
-rw-rw-r-- 1 ycl6 ycl6 30731 Oct 27 12:42 RSEM_Sp_ds/Sp_ds.isoforms.results
-rw-rw-r-- 1 ycl6 ycl6 28579 Oct 27 12:42 RSEM_Sp_hs/Sp_hs.genes.results
-rw-rw-r-- 1 ycl6 ycl6 30369 Oct 27 12:42 RSEM_Sp_hs/Sp_hs.isoforms.results
-rw-rw-r-- 1 ycl6 ycl6 28928 Oct 27 12:42 RSEM_Sp_log/Sp_log.genes.results
-rw-rw-r-- 1 ycl6 ycl6 30754 Oct 27 12:42 RSEM_Sp_log/Sp_log.isoforms.results
-rw-rw-r-- 1 ycl6 ycl6 28866 Oct 27 12:42 RSEM_Sp_plat/Sp_plat.genes.results
-rw-rw-r-- 1 ycl6 ycl6 30685 Oct 27 12:42 RSEM_Sp_plat/Sp_plat.isoforms.results
```

We can use `head` to examine these files. Your values may not be the same because the assembly results are not deterministic.

`head RSEM_Sp_ds/Sp_ds.genes.results`

```
gene_id transcript_id(s)        length  effective_length        expected_count  TPM     FPKM
TRINITY_DN0_c0_g1       TRINITY_DN0_c0_g1_i1    1420.00 1155.49 34.00   333.15  308.72
TRINITY_DN100_c0_g1     TRINITY_DN100_c0_g1_i1  289.00  39.76   21.72   6183.87 5730.39
TRINITY_DN101_c0_g1     TRINITY_DN101_c0_g1_i1  1695.00 1430.49 24.00   189.96  176.03
TRINITY_DN102_c0_g1     TRINITY_DN102_c0_g1_i1  3075.00 2810.49 110.00  443.14  410.65
TRINITY_DN103_c0_g1     TRINITY_DN103_c0_g1_i1  251.00  17.89   0.00    0.00    0.00
TRINITY_DN104_c0_g1     TRINITY_DN104_c0_g1_i1  3373.00 3108.49 106.00  386.09  357.78
TRINITY_DN105_c0_g1     TRINITY_DN105_c0_g1_i1  784.00  519.49  34.00   741.02  686.68
TRINITY_DN106_c0_g1     TRINITY_DN106_c0_g1_i1  1741.00 1476.49 83.87   643.14  595.98
TRINITY_DN107_c0_g1     TRINITY_DN107_c0_g1_i1  2416.00 2151.49 60.00   315.75  292.60
```

`head RSEM_Sp_ds/Sp_ds.isoforms.results`

```
transcript_id   gene_id length  effective_length        expected_count  TPM     FPKM    IsoPct
TRINITY_DN0_c0_g1_i1    TRINITY_DN0_c0_g1       1420    1155.49 34.00   333.15  308.72  100.00
TRINITY_DN100_c0_g1_i1  TRINITY_DN100_c0_g1     289     39.76   21.72   6183.87 5730.39 100.00
TRINITY_DN101_c0_g1_i1  TRINITY_DN101_c0_g1     1695    1430.49 24.00   189.96  176.03  100.00
TRINITY_DN102_c0_g1_i1  TRINITY_DN102_c0_g1     3075    2810.49 110.00  443.14  410.65  100.00
TRINITY_DN103_c0_g1_i1  TRINITY_DN103_c0_g1     251     17.89   0.00    0.00    0.00    0.00
TRINITY_DN104_c0_g1_i1  TRINITY_DN104_c0_g1     3373    3108.49 106.00  386.09  357.78  100.00
TRINITY_DN105_c0_g1_i1  TRINITY_DN105_c0_g1     784     519.49  34.00   741.02  686.68  100.00
TRINITY_DN106_c0_g1_i1  TRINITY_DN106_c0_g1     1741    1476.49 83.87   643.14  595.98  100.00
TRINITY_DN107_c0_g1_i1  TRINITY_DN107_c0_g1     2416    2151.49 60.00   315.75  292.60  100.00
```


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://ycl6.gitbook.io/rna-seq-data-analysis/de_novo_assembly_using_trinity/quantification_using_rsem2.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
