De novo assembly using Trinity

Trinity is one of the most popular software package for efficient and robust de novo reconstruction of transcriptomes from RNA-Seq data. It consists of three software modules, Inchworm, Chrysalis and Butterfly, that run sequentially to process the sequencing reads.

Quote from Trinityarrow-up-right GitHub:

  • Inchworm assembles the RNA-seq data into the unique sequences of transcripts, often generating full-length transcripts for a dominant isoform, but then reports just the unique portions of alternatively spliced transcripts.

  • Chrysalis clusters the Inchworm contigs into clusters and constructs complete de Bruijn graphs for each cluster. Each cluster represents the full transcriptional complexity for a given gene (or sets of genes that share sequences in common). Chrysalis then partitions the full read set among these disjoint graphs.

  • Butterfly then processes the individual graphs in parallel, tracing the paths that reads and pairs of reads take within the graph, ultimately reporting full-length transcripts for alternatively spliced isoforms, and teasing apart transcripts that corresponds to paralogous genes.

Materials

The Trinity developers have provided training materialsarrow-up-right, and the raw data and the software required are built into a VirtualBox image (Trinity2015.ova). I have saved a copy on ALPS1. The RNA-Seq data are 76 bp strand-specific Illumina RNA-Seq paired-end reads derived from Schizosaccharomyces pombe (fission yeast) grown under 4 conditions:

  1. logarithmic growth (Sp_log)

  2. plateau phase (Sp_plat)

  3. heat shock (Sp_hs)

  4. diauxic shift (Sp_ds)

* Due to the space limitation of gitbook, I will not provide the fq.gz files here, please obtain these files from the VirtualBox image [Linkarrow-up-right]

-rw-rw-r-- 1 ycl6 ycl6  5790168 Oct 27 11:35 RNASEQ_data/Sp_ds.left.fq.gz
-rw-rw-r-- 1 ycl6 ycl6  5590326 Oct 27 11:35 RNASEQ_data/Sp_ds.right.fq.gz
-rw-rw-r-- 1 ycl6 ycl6  5815390 Oct 27 11:35 RNASEQ_data/Sp_hs.left.fq.gz
-rw-rw-r-- 1 ycl6 ycl6  5751383 Oct 27 11:36 RNASEQ_data/Sp_hs.right.fq.gz
-rw-rw-r-- 1 ycl6 ycl6  2154125 Oct 27 11:36 RNASEQ_data/Sp_log.left.fq.gz
-rw-rw-r-- 1 ycl6 ycl6  2097534 Oct 27 11:36 RNASEQ_data/Sp_log.right.fq.gz
-rw-rw-r-- 1 ycl6 ycl6  5488286 Oct 27 11:36 RNASEQ_data/Sp_plat.left.fq.gz
-rw-rw-r-- 1 ycl6 ycl6  5238362 Oct 27 11:36 RNASEQ_data/Sp_plat.right.fq.gz

Software

  • v2.2.0 [17 Mar 2016] - Latest version available at the time of writing and used in this exercise

  • v2.0.6 [13 Mar 2015] - Latest version available on ALPS1

  • v1.1.2 [23 Jun 2015] - Latest version available at the time of writing and used in this exercise

  • v1.0.1 [14 Mar 2014] - Latest version available on ALPS1

GMAParrow-up-right (Genomic Mapping and Alignment Program)

  • v2016-09-23 - Latest version available at the time of writing and used in this exercise

STARarrow-up-right (Spliced Transcripts Alignment to a Reference)

  • v2.5.2b [20 Aug 2016] - Latest version available at the time of writing and used in this exercise

  • v2.3.0e [14 Feb 2013] - Latest version available on ALPS1

  • v1.3.1 [22 Apr 2016] - Latest version available at the time of writing and used in this exercise

  • v1.2 [02 Feb 2015] - Latest version available on ALPS1

RSEMarrow-up-right (RNA-Seq by Expectation-Maximization)

  • v1.3.0 [02 Oct 2016] - Latest version available at the time of writing

  • v1.2.31 [04 Jun 2016] - Version used in this exercise

  • v1.2.19 [05 Nov 2014] - Latest version available on ALPS1

Set JAVA_HOME and PATH

Bowtie 1 (NOT Bowtie 2) is required by the Chrysalis module.

* Below is an example showing how to set up the paths, please remember to change the paths to these binaries accordingly.

You can use echo $PATH to check the new PATH variable.

Last updated