Compare de novo reconstructed transcripts to reference annotations

Download Schizosaccharomyces pombe genome and annotation

1
mkdir ~/LSLNGS2015/Trinity/GENOME_data
2
cd ~/LSLNGS2015/Trinity/GENOME_data
3
4
wget ftp://ftp.ensemblgenomes.org/pub/fungi/release-33/fasta/schizosaccharomyces_pombe/dna/Schizosaccharomyces_pombe.ASM294v2.dna.toplevel.fa.gz
5
wget ftp://ftp.ensemblgenomes.org/pub/fungi/release-33/gff3/schizosaccharomyces_pombe/Schizosaccharomyces_pombe.ASM294v2.33.gff3.gz
Copied!
1
gunzip Schizosaccharomyces_pombe.ASM294v2.dna.toplevel.fa.gz
2
gunzip Schizosaccharomyces_pombe.ASM294v2.33.gff3.gz
3
4
samtools faidx Schizosaccharomyces_pombe.ASM294v2.dna.toplevel.fa
Copied!

Builds the GMAP database

Usage

gmap_build [options...] -d <genomename> <fasta_files>

Execute

1
cd ~/LSLNGS2015/Trinity
2
3
bsub -q 16G -o ./gmap_build.std -e ./gmap_build.err -J gmap_build \
4
"gmap_build -d gmap_spo -D . -k 13 Schizosaccharomyces_pombe.ASM294v2.dna.toplevel.fa"
Copied!

Resource usage

ALPS Queue Name
CPU Time
Max Memory
Duration
16G
78.21 sec.
-
1 minute 41 seconds

Options

-d Genome name.
-D Destination directory.
-k k-mer value for genomic index (allowed: 15 or less).

Align Trinity transcripts to genome using GMAP

Usage

gmap [OPTIONS...] <FASTA files...>, or cat <FASTA files...> | gmap [OPTIONS...]

Execute

1
cd ~/LSLNGS2015/Trinity
2
3
bsub -q 16G -o ./gmap.std -e ./gmap.err -J gmap \
4
"gmap -t 6 -n 0 -D . -d gmap_spo trinity_reference/Trinity.fasta -f samse > trinity_gmap.sam"
Copied!

Resource usage

ALPS Queue Name
CPU Time
Max Memory
Duration
16G
11.90 sec.
-
7 seconds

Options

-t Number of worker threads.
-n Maximum number of paths to show. If set to 0, GMAP reports single alignment plus chimeric alignments.
-D Destination directory.
-d Genome database.

Convert SAM to sorted BAM and create index

Convert SAM to BAM and sort by coordinates
1
bsub -q 16G -J sort \
2
"samtools sort -T trinity_gmap -o trinity_gmap.bam trinity_gmap.sam"
Copied!
After BAM is generated, create index
samtools index trinity_gmap.bam

Builds the STAR database

1
cd ~/LSLNGS2015/Trinity
2
mkdir star_spo
3
4
bsub -q 16G -o ./star_build.std -e ./star_build.err -J star_build \
5
"STAR --runThreadN 6 --runMode genomeGenerate --genomeDir star_spo \
6
--genomeFastaFiles GENOME_data/Schizosaccharomyces_pombe.ASM294v2.dna.toplevel.fa"
Copied!

Resource usage

ALPS Queue Name
CPU Time
Max Memory
Duration
16G
40.19 sec.
-
29 seconds

Align RNA-Seq reads to genome using STAR

1
mkdir star_alignment
2
3
bsub -q 16G -o ./star_map.std -e ./star_map.err -J star_map \
4
"STAR --genomeDir star_spo \
5
--readFilesIn RNASEQ_data/sp.left.fq.gz RNASEQ_data/sp.right.fq.gz \
6
--readFilesCommand zcat --outSAMtype BAM SortedByCoordinate --outFilterMultimapNmax 10 \
7
--outSAMunmapped None --twopassMode Basic \
8
--runThreadN 6 --outFileNamePrefix \"star_alignment/\""
Copied!

Resource usage

ALPS Queue Name
CPU Time
Max Memory
Duration
16G
157.56 sec.
4 GB
1 minute 33 seconds
After BAM is generated, create index
samtools index star_alignment/Aligned.sortedByCoord.out.bam

List files

Use ls -la to list the files in your Trinity folder
1
drwxrwxr-x 14 ycl6 ycl6 4096 Oct 27 12:52 .
2
drwxrwxr-x 6 ycl6 ycl6 4096 Nov 1 13:44 ..
3
drwxrwxr-x 2 ycl6 ycl6 4096 Oct 27 11:58 GENOME_data
4
drwxrwxr-x 3 ycl6 ycl6 4096 Oct 27 10:55 gmap_spo
5
drwxrwxr-x 2 ycl6 ycl6 4096 Oct 27 12:19 RNASEQ_data
6
drwxrwxr-x 4 ycl6 ycl6 4096 Oct 27 12:36 star_alignment
7
drwxrwxr-x 2 ycl6 ycl6 4096 Oct 27 12:32 star_spo
8
-rw-rw-r-- 1 ycl6 ycl6 132734 Oct 27 12:25 trinity_gmap.bam
9
-rw-rw-r-- 1 ycl6 ycl6 9872 Oct 27 12:27 trinity_gmap.bam.bai
10
-rw-rw-r-- 1 ycl6 ycl6 454354 Oct 27 12:23 trinity_gmap.sam
11
drwxrwxr-x 4 ycl6 ycl6 4096 Oct 27 12:41 trinity_reference
12
-rw-rw-r-- 1 ycl6 ycl6 21181 Oct 27 12:22 trinity_reference.log
Copied!

Visualize data using IGV

The Integrative Genomics Viewer (IGV) is a Java-based visualization tool for interactive exploration of large, integrated genomic datasets.

Download data file to your computer

Please use WinSCP or similar sFTP clients to download the following files from your off-site server such as ALPS1 to local PC or laptop:
1
GENOME_data/Schizosaccharomyces_pombe.ASM294v2.dna.toplevel.fa
2
GENOME_data/Schizosaccharomyces_pombe.ASM294v2.dna.toplevel.fa.fai
3
GENOME_data/Schizosaccharomyces_pombe.ASM294v2.33.gff3
4
trinity_gmap.bam
5
trinity_gmap.bam.bai
6
star_alignment/Aligned.sortedByCoord.out.bam
7
star_alignment/Aligned.sortedByCoord.out.bam.bai
Copied!

Launch IGV via Java Web Start

Go to the IGV downloads page: http://software.broadinstitute.org/software/igv/download. When prompted, register or log in as requested. Click the launch icon to initiate the web start launch window.
Install IGV

After launching IGV

    1.
    Go to Genomes -> Load Genome from File, and select Schizosaccharomyces_pombe.ASM294v2.dna.toplevel.fa. Click Open.
    2.
    Go to File -> Load from File, and select Aligned.sortedByCoord.out.bam, trinity_gmap.bam and Schizosaccharomyces_pombe.ASM294v2.33.gff3. Click Open.
Change the chromosome range to I:577,956-583,212.
In the window showing Aligned.sortedByCoord.out.bam, you can right click > Color alignments by > read strand, to change reads to Blue and Red colors for your paired-end reads.
IGV in action
Last modified 1yr ago