Compare de novo reconstructed transcripts to reference annotations

Download Schizosaccharomyces pombe genome and annotation

mkdir ~/LSLNGS2015/Trinity/GENOME_data
cd ~/LSLNGS2015/Trinity/GENOME_data

wget ftp://ftp.ensemblgenomes.org/pub/fungi/release-33/fasta/schizosaccharomyces_pombe/dna/Schizosaccharomyces_pombe.ASM294v2.dna.toplevel.fa.gz
wget ftp://ftp.ensemblgenomes.org/pub/fungi/release-33/gff3/schizosaccharomyces_pombe/Schizosaccharomyces_pombe.ASM294v2.33.gff3.gz
gunzip Schizosaccharomyces_pombe.ASM294v2.dna.toplevel.fa.gz
gunzip Schizosaccharomyces_pombe.ASM294v2.33.gff3.gz

samtools faidx Schizosaccharomyces_pombe.ASM294v2.dna.toplevel.fa

Builds the GMAP database

Usage

gmap_build [options...] -d <genomename> <fasta_files>

Execute

cd ~/LSLNGS2015/Trinity

bsub -q 16G -o ./gmap_build.std -e ./gmap_build.err -J gmap_build \
"gmap_build -d gmap_spo -D . -k 13 Schizosaccharomyces_pombe.ASM294v2.dna.toplevel.fa"

Resource usage

ALPS Queue Name

CPU Time

Max Memory

Duration

16G

78.21 sec.

-

1 minute 41 seconds

Options

-d Genome name.

-D Destination directory.

-k k-mer value for genomic index (allowed: 15 or less).

Align Trinity transcripts to genome using GMAP

Usage

gmap [OPTIONS...] <FASTA files...>, or cat <FASTA files...> | gmap [OPTIONS...]

Execute

Resource usage

ALPS Queue Name

CPU Time

Max Memory

Duration

16G

11.90 sec.

-

7 seconds

Options

-t Number of worker threads.

-n Maximum number of paths to show. If set to 0, GMAP reports single alignment plus chimeric alignments.

-D Destination directory.

-d Genome database.

Convert SAM to sorted BAM and create index

Convert SAM to BAM and sort by coordinates

After BAM is generated, create index

samtools index trinity_gmap.bam

Builds the STAR database

Resource usage

ALPS Queue Name

CPU Time

Max Memory

Duration

16G

40.19 sec.

-

29 seconds

Align RNA-Seq reads to genome using STAR

Resource usage

ALPS Queue Name

CPU Time

Max Memory

Duration

16G

157.56 sec.

4 GB

1 minute 33 seconds

After BAM is generated, create index

samtools index star_alignment/Aligned.sortedByCoord.out.bam

List files

Use ls -la to list the files in your Trinity folder

Visualize data using IGV

The Integrative Genomics Viewer (IGV) is a Java-based visualization tool for interactive exploration of large, integrated genomic datasets.

Download data file to your computer

Please use WinSCP or similar sFTP clients to download the following files from your off-site server such as ALPS1 to local PC or laptop:

Launch IGV via Java Web Start

Go to the IGV downloads page: http://software.broadinstitute.org/software/igv/download. When prompted, register or log in as requested. Click the launch icon to initiate the web start launch window.

Install IGV

After launching IGV

  1. Go to Genomes -> Load Genome from File, and select Schizosaccharomyces_pombe.ASM294v2.dna.toplevel.fa. Click Open.

  2. Go to File -> Load from File, and select Aligned.sortedByCoord.out.bam, trinity_gmap.bam and Schizosaccharomyces_pombe.ASM294v2.33.gff3. Click Open.

Change the chromosome range to I:577,956-583,212.

In the window showing Aligned.sortedByCoord.out.bam, you can right click > Color alignments by > read strand, to change reads to Blue and Red colors for your paired-end reads.

IGV in action

Last updated

Was this helpful?