De novo assembly using Trinity
Trinity is one of the most popular software package for efficient and robust de novo reconstruction of transcriptomes from RNA-Seq data. It consists of three software modules, Inchworm, Chrysalis and Butterfly, that run sequentially to process the sequencing reads.
Quote from Trinity GitHub:
Inchworm assembles the RNA-seq data into the unique sequences of transcripts, often generating full-length transcripts for a dominant isoform, but then reports just the unique portions of alternatively spliced transcripts.
Chrysalis clusters the Inchworm contigs into clusters and constructs complete de Bruijn graphs for each cluster. Each cluster represents the full transcriptional complexity for a given gene (or sets of genes that share sequences in common). Chrysalis then partitions the full read set among these disjoint graphs.
Butterfly then processes the individual graphs in parallel, tracing the paths that reads and pairs of reads take within the graph, ultimately reporting full-length transcripts for alternatively spliced isoforms, and teasing apart transcripts that corresponds to paralogous genes.
Materials
The Trinity developers have provided training materials, and the raw data and the software required are built into a VirtualBox image (Trinity2015.ova). I have saved a copy on ALPS1. The RNA-Seq data are 76 bp strand-specific Illumina RNA-Seq paired-end reads derived from Schizosaccharomyces pombe (fission yeast) grown under 4 conditions:
logarithmic growth (Sp_log)
plateau phase (Sp_plat)
heat shock (Sp_hs)
diauxic shift (Sp_ds)
* Due to the space limitation of gitbook, I will not provide the fq.gz
files here, please obtain these files from the VirtualBox image [Link]
Software
v2.2.0 [17 Mar 2016] - Latest version available at the time of writing and used in this exercise
v2.0.6 [13 Mar 2015] - Latest version available on ALPS1
v1.1.2 [23 Jun 2015] - Latest version available at the time of writing and used in this exercise
v1.0.1 [14 Mar 2014] - Latest version available on ALPS1
GMAP (Genomic Mapping and Alignment Program)
v2016-09-23 - Latest version available at the time of writing and used in this exercise
STAR (Spliced Transcripts Alignment to a Reference)
v2.5.2b [20 Aug 2016] - Latest version available at the time of writing and used in this exercise
v2.3.0e [14 Feb 2013] - Latest version available on ALPS1
v1.3.1 [22 Apr 2016] - Latest version available at the time of writing and used in this exercise
v1.2 [02 Feb 2015] - Latest version available on ALPS1
RSEM (RNA-Seq by Expectation-Maximization)
v1.3.0 [02 Oct 2016] - Latest version available at the time of writing
v1.2.31 [04 Jun 2016] - Version used in this exercise
v1.2.19 [05 Nov 2014] - Latest version available on ALPS1
Set JAVA_HOME and PATH
Bowtie 1 (NOT Bowtie 2) is required by the Chrysalis module.
* Below is an example showing how to set up the paths, please remember to change the paths to these binaries accordingly.
You can use echo $PATH
to check the new PATH variable.
Last updated
Was this helpful?