Gene-level RNA-Seq Data Analysis (lagacy)
  • Introduction
  • RNA-Seq Analysis Workflow
    • Login to server
    • Obtain data and software
    • Create mapping indices
    • Mapping with STAR
    • Quantification using RSEM
  • De novo assembly using Trinity
    • De novo assembly of RNA-Seq reads
    • Compare de novo reconstructed transcripts to reference annotations
    • Quantification using RSEM
  • Differential expression analysis using R
    • Install R libraries
    • Perform DE analysis
    • Perform DE analysis (Trinity)
    • Extracting DE transcripts and generating heatmaps (Trinity)
  • Visualization
Powered by GitBook
On this page
  • Download Schizosaccharomyces pombe genome and annotation
  • Builds the GMAP database
  • Usage
  • Execute
  • Resource usage
  • Options
  • Align Trinity transcripts to genome using GMAP
  • Usage
  • Execute
  • Resource usage
  • Options
  • Convert SAM to sorted BAM and create index
  • Builds the STAR database
  • Resource usage
  • Align RNA-Seq reads to genome using STAR
  • Resource usage
  • List files
  • Visualize data using IGV
  • Download data file to your computer
  • Launch IGV via Java Web Start
  • After launching IGV

Was this helpful?

  1. De novo assembly using Trinity

Compare de novo reconstructed transcripts to reference annotations

Download Schizosaccharomyces pombe genome and annotation

mkdir ~/LSLNGS2015/Trinity/GENOME_data
cd ~/LSLNGS2015/Trinity/GENOME_data

wget ftp://ftp.ensemblgenomes.org/pub/fungi/release-33/fasta/schizosaccharomyces_pombe/dna/Schizosaccharomyces_pombe.ASM294v2.dna.toplevel.fa.gz
wget ftp://ftp.ensemblgenomes.org/pub/fungi/release-33/gff3/schizosaccharomyces_pombe/Schizosaccharomyces_pombe.ASM294v2.33.gff3.gz
gunzip Schizosaccharomyces_pombe.ASM294v2.dna.toplevel.fa.gz
gunzip Schizosaccharomyces_pombe.ASM294v2.33.gff3.gz

samtools faidx Schizosaccharomyces_pombe.ASM294v2.dna.toplevel.fa

Builds the GMAP database

Usage

gmap_build [options...] -d <genomename> <fasta_files>

Execute

cd ~/LSLNGS2015/Trinity

bsub -q 16G -o ./gmap_build.std -e ./gmap_build.err -J gmap_build \
"gmap_build -d gmap_spo -D . -k 13 Schizosaccharomyces_pombe.ASM294v2.dna.toplevel.fa"

Resource usage

ALPS Queue Name

CPU Time

Max Memory

Duration

16G

78.21 sec.

-

1 minute 41 seconds

Options

-d Genome name.

-D Destination directory.

-k k-mer value for genomic index (allowed: 15 or less).

Align Trinity transcripts to genome using GMAP

Usage

gmap [OPTIONS...] <FASTA files...>, or cat <FASTA files...> | gmap [OPTIONS...]

Execute

cd ~/LSLNGS2015/Trinity

bsub -q 16G -o ./gmap.std -e ./gmap.err -J gmap \
"gmap -t 6 -n 0 -D . -d gmap_spo trinity_reference/Trinity.fasta -f samse > trinity_gmap.sam"

Resource usage

ALPS Queue Name

CPU Time

Max Memory

Duration

16G

11.90 sec.

-

7 seconds

Options

-t Number of worker threads.

-n Maximum number of paths to show. If set to 0, GMAP reports single alignment plus chimeric alignments.

-D Destination directory.

-d Genome database.

Convert SAM to sorted BAM and create index

Convert SAM to BAM and sort by coordinates

bsub -q 16G -J sort \
"samtools sort -T trinity_gmap -o trinity_gmap.bam trinity_gmap.sam"

After BAM is generated, create index

samtools index trinity_gmap.bam

Builds the STAR database

cd ~/LSLNGS2015/Trinity
mkdir star_spo

bsub -q 16G -o ./star_build.std -e ./star_build.err -J star_build \
"STAR --runThreadN 6 --runMode genomeGenerate --genomeDir star_spo \
--genomeFastaFiles GENOME_data/Schizosaccharomyces_pombe.ASM294v2.dna.toplevel.fa"

Resource usage

ALPS Queue Name

CPU Time

Max Memory

Duration

16G

40.19 sec.

-

29 seconds

Align RNA-Seq reads to genome using STAR

mkdir star_alignment

bsub -q 16G -o ./star_map.std -e ./star_map.err -J star_map \
"STAR --genomeDir star_spo \
--readFilesIn RNASEQ_data/sp.left.fq.gz RNASEQ_data/sp.right.fq.gz \
--readFilesCommand zcat --outSAMtype BAM SortedByCoordinate  --outFilterMultimapNmax 10 \
--outSAMunmapped None --twopassMode Basic \
--runThreadN 6 --outFileNamePrefix \"star_alignment/\""

Resource usage

ALPS Queue Name

CPU Time

Max Memory

Duration

16G

157.56 sec.

4 GB

1 minute 33 seconds

After BAM is generated, create index

samtools index star_alignment/Aligned.sortedByCoord.out.bam

List files

Use ls -la to list the files in your Trinity folder

drwxrwxr-x 14 ycl6 ycl6   4096 Oct 27 12:52 .
drwxrwxr-x  6 ycl6 ycl6   4096 Nov  1 13:44 ..
drwxrwxr-x  2 ycl6 ycl6   4096 Oct 27 11:58 GENOME_data
drwxrwxr-x  3 ycl6 ycl6   4096 Oct 27 10:55 gmap_spo
drwxrwxr-x  2 ycl6 ycl6   4096 Oct 27 12:19 RNASEQ_data
drwxrwxr-x  4 ycl6 ycl6   4096 Oct 27 12:36 star_alignment
drwxrwxr-x  2 ycl6 ycl6   4096 Oct 27 12:32 star_spo
-rw-rw-r-- 1 ycl6 ycl6 132734 Oct 27 12:25 trinity_gmap.bam
-rw-rw-r-- 1 ycl6 ycl6   9872 Oct 27 12:27 trinity_gmap.bam.bai
-rw-rw-r-- 1 ycl6 ycl6 454354 Oct 27 12:23 trinity_gmap.sam
drwxrwxr-x 4 ycl6 ycl6   4096 Oct 27 12:41 trinity_reference
-rw-rw-r-- 1 ycl6 ycl6  21181 Oct 27 12:22 trinity_reference.log

Visualize data using IGV

Download data file to your computer

Please use WinSCP or similar sFTP clients to download the following files from your off-site server such as ALPS1 to local PC or laptop:

GENOME_data/Schizosaccharomyces_pombe.ASM294v2.dna.toplevel.fa
GENOME_data/Schizosaccharomyces_pombe.ASM294v2.dna.toplevel.fa.fai
GENOME_data/Schizosaccharomyces_pombe.ASM294v2.33.gff3
trinity_gmap.bam
trinity_gmap.bam.bai
star_alignment/Aligned.sortedByCoord.out.bam
star_alignment/Aligned.sortedByCoord.out.bam.bai

Launch IGV via Java Web Start

After launching IGV

  1. Go to Genomes -> Load Genome from File, and select Schizosaccharomyces_pombe.ASM294v2.dna.toplevel.fa. Click Open.

  2. Go to File -> Load from File, and select Aligned.sortedByCoord.out.bam, trinity_gmap.bam and Schizosaccharomyces_pombe.ASM294v2.33.gff3. Click Open.

Change the chromosome range to I:577,956-583,212.

In the window showing Aligned.sortedByCoord.out.bam, you can right click > Color alignments by > read strand, to change reads to Blue and Red colors for your paired-end reads.

PreviousDe novo assembly of RNA-Seq readsNextQuantification using RSEM

Last updated 5 years ago

Was this helpful?

The is a Java-based visualization tool for interactive exploration of large, integrated genomic datasets.

Go to the IGV downloads page: . When prompted, register or log in as requested. Click the launch icon to initiate the web start launch window.

Integrative Genomics Viewer (IGV)
http://software.broadinstitute.org/software/igv/download
Install IGV
IGV in action