Compare de novo reconstructed transcripts to reference annotations
Download Schizosaccharomyces pombe genome and annotation
mkdir ~/LSLNGS2015/Trinity/GENOME_data
cd ~/LSLNGS2015/Trinity/GENOME_data
wget ftp://ftp.ensemblgenomes.org/pub/fungi/release-33/fasta/schizosaccharomyces_pombe/dna/Schizosaccharomyces_pombe.ASM294v2.dna.toplevel.fa.gz
wget ftp://ftp.ensemblgenomes.org/pub/fungi/release-33/gff3/schizosaccharomyces_pombe/Schizosaccharomyces_pombe.ASM294v2.33.gff3.gz
gunzip Schizosaccharomyces_pombe.ASM294v2.dna.toplevel.fa.gz
gunzip Schizosaccharomyces_pombe.ASM294v2.33.gff3.gz
samtools faidx Schizosaccharomyces_pombe.ASM294v2.dna.toplevel.fa
Builds the GMAP database
Usage
gmap_build [options...] -d <genomename> <fasta_files>
Execute
cd ~/LSLNGS2015/Trinity
bsub -q 16G -o ./gmap_build.std -e ./gmap_build.err -J gmap_build \
"gmap_build -d gmap_spo -D . -k 13 Schizosaccharomyces_pombe.ASM294v2.dna.toplevel.fa"
Resource usage
ALPS Queue Name
CPU Time
Max Memory
Duration
16G
78.21 sec.
-
1 minute 41 seconds
Options
-d
Genome name.
-D
Destination directory.
-k
k-mer value for genomic index (allowed: 15 or less).
Align Trinity transcripts to genome using GMAP
Usage
gmap [OPTIONS...] <FASTA files...>, or cat <FASTA files...> | gmap [OPTIONS...]
Execute
cd ~/LSLNGS2015/Trinity
bsub -q 16G -o ./gmap.std -e ./gmap.err -J gmap \
"gmap -t 6 -n 0 -D . -d gmap_spo trinity_reference/Trinity.fasta -f samse > trinity_gmap.sam"
Resource usage
ALPS Queue Name
CPU Time
Max Memory
Duration
16G
11.90 sec.
-
7 seconds
Options
-t
Number of worker threads.
-n
Maximum number of paths to show. If set to 0, GMAP reports single alignment plus chimeric alignments.
-D
Destination directory.
-d
Genome database.
Convert SAM to sorted BAM and create index
Convert SAM to BAM and sort by coordinates
bsub -q 16G -J sort \
"samtools sort -T trinity_gmap -o trinity_gmap.bam trinity_gmap.sam"
After BAM is generated, create index
samtools index trinity_gmap.bam
Builds the STAR database
cd ~/LSLNGS2015/Trinity
mkdir star_spo
bsub -q 16G -o ./star_build.std -e ./star_build.err -J star_build \
"STAR --runThreadN 6 --runMode genomeGenerate --genomeDir star_spo \
--genomeFastaFiles GENOME_data/Schizosaccharomyces_pombe.ASM294v2.dna.toplevel.fa"
Resource usage
ALPS Queue Name
CPU Time
Max Memory
Duration
16G
40.19 sec.
-
29 seconds
Align RNA-Seq reads to genome using STAR
mkdir star_alignment
bsub -q 16G -o ./star_map.std -e ./star_map.err -J star_map \
"STAR --genomeDir star_spo \
--readFilesIn RNASEQ_data/sp.left.fq.gz RNASEQ_data/sp.right.fq.gz \
--readFilesCommand zcat --outSAMtype BAM SortedByCoordinate --outFilterMultimapNmax 10 \
--outSAMunmapped None --twopassMode Basic \
--runThreadN 6 --outFileNamePrefix \"star_alignment/\""
Resource usage
ALPS Queue Name
CPU Time
Max Memory
Duration
16G
157.56 sec.
4 GB
1 minute 33 seconds
After BAM is generated, create index
samtools index star_alignment/Aligned.sortedByCoord.out.bam
List files
Use ls -la
to list the files in your Trinity folder
drwxrwxr-x 14 ycl6 ycl6 4096 Oct 27 12:52 .
drwxrwxr-x 6 ycl6 ycl6 4096 Nov 1 13:44 ..
drwxrwxr-x 2 ycl6 ycl6 4096 Oct 27 11:58 GENOME_data
drwxrwxr-x 3 ycl6 ycl6 4096 Oct 27 10:55 gmap_spo
drwxrwxr-x 2 ycl6 ycl6 4096 Oct 27 12:19 RNASEQ_data
drwxrwxr-x 4 ycl6 ycl6 4096 Oct 27 12:36 star_alignment
drwxrwxr-x 2 ycl6 ycl6 4096 Oct 27 12:32 star_spo
-rw-rw-r-- 1 ycl6 ycl6 132734 Oct 27 12:25 trinity_gmap.bam
-rw-rw-r-- 1 ycl6 ycl6 9872 Oct 27 12:27 trinity_gmap.bam.bai
-rw-rw-r-- 1 ycl6 ycl6 454354 Oct 27 12:23 trinity_gmap.sam
drwxrwxr-x 4 ycl6 ycl6 4096 Oct 27 12:41 trinity_reference
-rw-rw-r-- 1 ycl6 ycl6 21181 Oct 27 12:22 trinity_reference.log
Visualize data using IGV
The Integrative Genomics Viewer (IGV) is a Java-based visualization tool for interactive exploration of large, integrated genomic datasets.
Download data file to your computer
Please use WinSCP or similar sFTP clients to download the following files from your off-site server such as ALPS1 to local PC or laptop:
GENOME_data/Schizosaccharomyces_pombe.ASM294v2.dna.toplevel.fa
GENOME_data/Schizosaccharomyces_pombe.ASM294v2.dna.toplevel.fa.fai
GENOME_data/Schizosaccharomyces_pombe.ASM294v2.33.gff3
trinity_gmap.bam
trinity_gmap.bam.bai
star_alignment/Aligned.sortedByCoord.out.bam
star_alignment/Aligned.sortedByCoord.out.bam.bai
Launch IGV via Java Web Start
Go to the IGV downloads page: http://software.broadinstitute.org/software/igv/download. When prompted, register or log in as requested. Click the launch icon to initiate the web start launch window.
After launching IGV
Go to Genomes -> Load Genome from File, and select Schizosaccharomyces_pombe.ASM294v2.dna.toplevel.fa. Click Open.
Go to File -> Load from File, and select Aligned.sortedByCoord.out.bam, trinity_gmap.bam and Schizosaccharomyces_pombe.ASM294v2.33.gff3. Click Open.
Change the chromosome range to I:577,956-583,212
.
In the window showing Aligned.sortedByCoord.out.bam, you can right click > Color alignments by > read strand, to change reads to Blue and Red colors for your paired-end reads.
Last updated
Was this helpful?