Compare de novo reconstructed transcripts to reference annotations

Download Schizosaccharomyces pombe genome and annotation

mkdir ~/LSLNGS2015/Trinity/GENOME_data
cd ~/LSLNGS2015/Trinity/GENOME_data

wget ftp://ftp.ensemblgenomes.org/pub/fungi/release-33/fasta/schizosaccharomyces_pombe/dna/Schizosaccharomyces_pombe.ASM294v2.dna.toplevel.fa.gz
wget ftp://ftp.ensemblgenomes.org/pub/fungi/release-33/gff3/schizosaccharomyces_pombe/Schizosaccharomyces_pombe.ASM294v2.33.gff3.gz
gunzip Schizosaccharomyces_pombe.ASM294v2.dna.toplevel.fa.gz
gunzip Schizosaccharomyces_pombe.ASM294v2.33.gff3.gz

samtools faidx Schizosaccharomyces_pombe.ASM294v2.dna.toplevel.fa

Builds the GMAP database

Usage

gmap_build [options...] -d <genomename> <fasta_files>

Execute

cd ~/LSLNGS2015/Trinity

bsub -q 16G -o ./gmap_build.std -e ./gmap_build.err -J gmap_build \
"gmap_build -d gmap_spo -D . -k 13 Schizosaccharomyces_pombe.ASM294v2.dna.toplevel.fa"

Resource usage

ALPS Queue Name

CPU Time

Max Memory

Duration

16G

78.21 sec.

-

1 minute 41 seconds

Options

-d Genome name.

-D Destination directory.

-k k-mer value for genomic index (allowed: 15 or less).

Align Trinity transcripts to genome using GMAP

Usage

gmap [OPTIONS...] <FASTA files...>, or cat <FASTA files...> | gmap [OPTIONS...]

Execute

cd ~/LSLNGS2015/Trinity

bsub -q 16G -o ./gmap.std -e ./gmap.err -J gmap \
"gmap -t 6 -n 0 -D . -d gmap_spo trinity_reference/Trinity.fasta -f samse > trinity_gmap.sam"

Resource usage

ALPS Queue Name

CPU Time

Max Memory

Duration

16G

11.90 sec.

-

7 seconds

Options

-t Number of worker threads.

-n Maximum number of paths to show. If set to 0, GMAP reports single alignment plus chimeric alignments.

-D Destination directory.

-d Genome database.

Convert SAM to sorted BAM and create index

Convert SAM to BAM and sort by coordinates

bsub -q 16G -J sort \
"samtools sort -T trinity_gmap -o trinity_gmap.bam trinity_gmap.sam"

After BAM is generated, create index

samtools index trinity_gmap.bam

Builds the STAR database

cd ~/LSLNGS2015/Trinity
mkdir star_spo

bsub -q 16G -o ./star_build.std -e ./star_build.err -J star_build \
"STAR --runThreadN 6 --runMode genomeGenerate --genomeDir star_spo \
--genomeFastaFiles GENOME_data/Schizosaccharomyces_pombe.ASM294v2.dna.toplevel.fa"

Resource usage

ALPS Queue Name

CPU Time

Max Memory

Duration

16G

40.19 sec.

-

29 seconds

Align RNA-Seq reads to genome using STAR

mkdir star_alignment

bsub -q 16G -o ./star_map.std -e ./star_map.err -J star_map \
"STAR --genomeDir star_spo \
--readFilesIn RNASEQ_data/sp.left.fq.gz RNASEQ_data/sp.right.fq.gz \
--readFilesCommand zcat --outSAMtype BAM SortedByCoordinate  --outFilterMultimapNmax 10 \
--outSAMunmapped None --twopassMode Basic \
--runThreadN 6 --outFileNamePrefix \"star_alignment/\""

Resource usage

ALPS Queue Name

CPU Time

Max Memory

Duration

16G

157.56 sec.

4 GB

1 minute 33 seconds

After BAM is generated, create index

samtools index star_alignment/Aligned.sortedByCoord.out.bam

List files

Use ls -la to list the files in your Trinity folder

drwxrwxr-x 14 ycl6 ycl6   4096 Oct 27 12:52 .
drwxrwxr-x  6 ycl6 ycl6   4096 Nov  1 13:44 ..
drwxrwxr-x  2 ycl6 ycl6   4096 Oct 27 11:58 GENOME_data
drwxrwxr-x  3 ycl6 ycl6   4096 Oct 27 10:55 gmap_spo
drwxrwxr-x  2 ycl6 ycl6   4096 Oct 27 12:19 RNASEQ_data
drwxrwxr-x  4 ycl6 ycl6   4096 Oct 27 12:36 star_alignment
drwxrwxr-x  2 ycl6 ycl6   4096 Oct 27 12:32 star_spo
-rw-rw-r-- 1 ycl6 ycl6 132734 Oct 27 12:25 trinity_gmap.bam
-rw-rw-r-- 1 ycl6 ycl6   9872 Oct 27 12:27 trinity_gmap.bam.bai
-rw-rw-r-- 1 ycl6 ycl6 454354 Oct 27 12:23 trinity_gmap.sam
drwxrwxr-x 4 ycl6 ycl6   4096 Oct 27 12:41 trinity_reference
-rw-rw-r-- 1 ycl6 ycl6  21181 Oct 27 12:22 trinity_reference.log

Visualize data using IGV

The Integrative Genomics Viewer (IGV) is a Java-based visualization tool for interactive exploration of large, integrated genomic datasets.

Download data file to your computer

Please use WinSCP or similar sFTP clients to download the following files from your off-site server such as ALPS1 to local PC or laptop:

GENOME_data/Schizosaccharomyces_pombe.ASM294v2.dna.toplevel.fa
GENOME_data/Schizosaccharomyces_pombe.ASM294v2.dna.toplevel.fa.fai
GENOME_data/Schizosaccharomyces_pombe.ASM294v2.33.gff3
trinity_gmap.bam
trinity_gmap.bam.bai
star_alignment/Aligned.sortedByCoord.out.bam
star_alignment/Aligned.sortedByCoord.out.bam.bai

Launch IGV via Java Web Start

Go to the IGV downloads page: http://software.broadinstitute.org/software/igv/download. When prompted, register or log in as requested. Click the launch icon to initiate the web start launch window.

After launching IGV

  1. Go to Genomes -> Load Genome from File, and select Schizosaccharomyces_pombe.ASM294v2.dna.toplevel.fa. Click Open.

  2. Go to File -> Load from File, and select Aligned.sortedByCoord.out.bam, trinity_gmap.bam and Schizosaccharomyces_pombe.ASM294v2.33.gff3. Click Open.

Change the chromosome range to I:577,956-583,212.

In the window showing Aligned.sortedByCoord.out.bam, you can right click > Color alignments by > read strand, to change reads to Blue and Red colors for your paired-end reads.

Last updated