Gene-level RNA-Seq Data Analysis (lagacy)
  • Introduction
  • RNA-Seq Analysis Workflow
    • Login to server
    • Obtain data and software
    • Create mapping indices
    • Mapping with STAR
    • Quantification using RSEM
  • De novo assembly using Trinity
    • De novo assembly of RNA-Seq reads
    • Compare de novo reconstructed transcripts to reference annotations
    • Quantification using RSEM
  • Differential expression analysis using R
    • Install R libraries
    • Perform DE analysis
    • Perform DE analysis (Trinity)
    • Extracting DE transcripts and generating heatmaps (Trinity)
  • Visualization
Powered by GitBook
On this page
  • Materials
  • Software
  • Trinity
  • Bowtie
  • GMAP (Genomic Mapping and Alignment Program)
  • STAR (Spliced Transcripts Alignment to a Reference)
  • SAMtools
  • RSEM (RNA-Seq by Expectation-Maximization)
  • Set JAVA_HOME and PATH

Was this helpful?

De novo assembly using Trinity

PreviousQuantification using RSEMNextDe novo assembly of RNA-Seq reads

Last updated 5 years ago

Was this helpful?

Trinity is one of the most popular software package for efficient and robust de novo reconstruction of transcriptomes from RNA-Seq data. It consists of three software modules, Inchworm, Chrysalis and Butterfly, that run sequentially to process the sequencing reads.

Quote from GitHub:

  • Inchworm assembles the RNA-seq data into the unique sequences of transcripts, often generating full-length transcripts for a dominant isoform, but then reports just the unique portions of alternatively spliced transcripts.

  • Chrysalis clusters the Inchworm contigs into clusters and constructs complete de Bruijn graphs for each cluster. Each cluster represents the full transcriptional complexity for a given gene (or sets of genes that share sequences in common). Chrysalis then partitions the full read set among these disjoint graphs.

  • Butterfly then processes the individual graphs in parallel, tracing the paths that reads and pairs of reads take within the graph, ultimately reporting full-length transcripts for alternatively spliced isoforms, and teasing apart transcripts that corresponds to paralogous genes.

Materials

The Trinity developers have provided , and the raw data and the software required are built into a VirtualBox image (Trinity2015.ova). I have saved a copy on ALPS1. The RNA-Seq data are 76 bp strand-specific Illumina RNA-Seq paired-end reads derived from Schizosaccharomyces pombe (fission yeast) grown under 4 conditions:

  1. logarithmic growth (Sp_log)

  2. plateau phase (Sp_plat)

  3. heat shock (Sp_hs)

  4. diauxic shift (Sp_ds)

* Due to the space limitation of gitbook, I will not provide the fq.gz files here, please obtain these files from the VirtualBox image []

-rw-rw-r-- 1 ycl6 ycl6  5790168 Oct 27 11:35 RNASEQ_data/Sp_ds.left.fq.gz
-rw-rw-r-- 1 ycl6 ycl6  5590326 Oct 27 11:35 RNASEQ_data/Sp_ds.right.fq.gz
-rw-rw-r-- 1 ycl6 ycl6  5815390 Oct 27 11:35 RNASEQ_data/Sp_hs.left.fq.gz
-rw-rw-r-- 1 ycl6 ycl6  5751383 Oct 27 11:36 RNASEQ_data/Sp_hs.right.fq.gz
-rw-rw-r-- 1 ycl6 ycl6  2154125 Oct 27 11:36 RNASEQ_data/Sp_log.left.fq.gz
-rw-rw-r-- 1 ycl6 ycl6  2097534 Oct 27 11:36 RNASEQ_data/Sp_log.right.fq.gz
-rw-rw-r-- 1 ycl6 ycl6  5488286 Oct 27 11:36 RNASEQ_data/Sp_plat.left.fq.gz
-rw-rw-r-- 1 ycl6 ycl6  5238362 Oct 27 11:36 RNASEQ_data/Sp_plat.right.fq.gz

Software

  • v2.2.0 [17 Mar 2016] - Latest version available at the time of writing and used in this exercise

  • v2.0.6 [13 Mar 2015] - Latest version available on ALPS1

  • v1.1.2 [23 Jun 2015] - Latest version available at the time of writing and used in this exercise

  • v1.0.1 [14 Mar 2014] - Latest version available on ALPS1

  • v2016-09-23 - Latest version available at the time of writing and used in this exercise

  • v2.5.2b [20 Aug 2016] - Latest version available at the time of writing and used in this exercise

  • v2.3.0e [14 Feb 2013] - Latest version available on ALPS1

  • v1.3.1 [22 Apr 2016] - Latest version available at the time of writing and used in this exercise

  • v1.2 [02 Feb 2015] - Latest version available on ALPS1

  • v1.3.0 [02 Oct 2016] - Latest version available at the time of writing

  • v1.2.31 [04 Jun 2016] - Version used in this exercise

  • v1.2.19 [05 Nov 2014] - Latest version available on ALPS1

Set JAVA_HOME and PATH

Bowtie 1 (NOT Bowtie 2) is required by the Chrysalis module.

* Below is an example showing how to set up the paths, please remember to change the paths to these binaries accordingly.

cd ~/

export JAVA_HOME=/pkg/java/jdk1.7.0_51/bin/java

export PATH=/pkg/java/jdk1.7.0_51/bin:/pkg/biology/Bowtie/bowtie-1.0.1:\
/work3/LSLNGS2015/Tools/RSEM-1.2.23:/pkg/biology/R/R-3.1.2/bin:\
/work3/LSLNGS2015/Tools/gmap-2015-09-29/bin:/pkg/biology/samtools/samtools-1.2:\
/work3/LSLNGS2015/Tools/STAR-STAR_2.4.2a/bin/Linux_x86_64_static:\
/pkg/biology/trinity/trinityrnaseq-2.0.6:$PATH

You can use echo $PATH to check the new PATH variable.

(Genomic Mapping and Alignment Program)

(Spliced Transcripts Alignment to a Reference)

(RNA-Seq by Expectation-Maximization)

Trinity
training materials
Link
Trinity
Bowtie
GMAP
STAR
SAMtools
RSEM