# Compare de novo reconstructed transcripts to reference annotations

### Download *Schizosaccharomyces pombe* genome and annotation

```
mkdir ~/LSLNGS2015/Trinity/GENOME_data
cd ~/LSLNGS2015/Trinity/GENOME_data

wget ftp://ftp.ensemblgenomes.org/pub/fungi/release-33/fasta/schizosaccharomyces_pombe/dna/Schizosaccharomyces_pombe.ASM294v2.dna.toplevel.fa.gz
wget ftp://ftp.ensemblgenomes.org/pub/fungi/release-33/gff3/schizosaccharomyces_pombe/Schizosaccharomyces_pombe.ASM294v2.33.gff3.gz
```

```
gunzip Schizosaccharomyces_pombe.ASM294v2.dna.toplevel.fa.gz
gunzip Schizosaccharomyces_pombe.ASM294v2.33.gff3.gz

samtools faidx Schizosaccharomyces_pombe.ASM294v2.dna.toplevel.fa
```

## Builds the GMAP database

### Usage

`gmap_build [options...] -d <genomename> <fasta_files>`

### Execute

```
cd ~/LSLNGS2015/Trinity

bsub -q 16G -o ./gmap_build.std -e ./gmap_build.err -J gmap_build \
"gmap_build -d gmap_spo -D . -k 13 Schizosaccharomyces_pombe.ASM294v2.dna.toplevel.fa"
```

### Resource usage

| ALPS Queue Name | CPU Time   | Max Memory | Duration            |
| --------------- | ---------- | ---------- | ------------------- |
| 16G             | 78.21 sec. | -          | 1 minute 41 seconds |

### Options

`-d` Genome name.

`-D` Destination directory.

`-k` k-mer value for genomic index (allowed: 15 or less).

## Align Trinity transcripts to genome using GMAP

### Usage

`gmap [OPTIONS...] <FASTA files...>, or cat <FASTA files...> | gmap [OPTIONS...]`

### Execute

```
cd ~/LSLNGS2015/Trinity

bsub -q 16G -o ./gmap.std -e ./gmap.err -J gmap \
"gmap -t 6 -n 0 -D . -d gmap_spo trinity_reference/Trinity.fasta -f samse > trinity_gmap.sam"
```

### Resource usage

| ALPS Queue Name | CPU Time   | Max Memory | Duration  |
| --------------- | ---------- | ---------- | --------- |
| 16G             | 11.90 sec. | -          | 7 seconds |

### Options

`-t` Number of worker threads.

`-n` Maximum number of paths to show. If set to 0, GMAP reports single alignment plus chimeric alignments.

`-D` Destination directory.

`-d` Genome database.

### Convert SAM to sorted BAM and create index

Convert SAM to BAM and sort by coordinates

```
bsub -q 16G -J sort \
"samtools sort -T trinity_gmap -o trinity_gmap.bam trinity_gmap.sam"
```

After BAM is generated, create index

`samtools index trinity_gmap.bam`

## Builds the STAR database

```
cd ~/LSLNGS2015/Trinity
mkdir star_spo

bsub -q 16G -o ./star_build.std -e ./star_build.err -J star_build \
"STAR --runThreadN 6 --runMode genomeGenerate --genomeDir star_spo \
--genomeFastaFiles GENOME_data/Schizosaccharomyces_pombe.ASM294v2.dna.toplevel.fa"
```

### Resource usage

| ALPS Queue Name | CPU Time   | Max Memory | Duration   |
| --------------- | ---------- | ---------- | ---------- |
| 16G             | 40.19 sec. | -          | 29 seconds |

## Align RNA-Seq reads to genome using STAR

```
mkdir star_alignment

bsub -q 16G -o ./star_map.std -e ./star_map.err -J star_map \
"STAR --genomeDir star_spo \
--readFilesIn RNASEQ_data/sp.left.fq.gz RNASEQ_data/sp.right.fq.gz \
--readFilesCommand zcat --outSAMtype BAM SortedByCoordinate  --outFilterMultimapNmax 10 \
--outSAMunmapped None --twopassMode Basic \
--runThreadN 6 --outFileNamePrefix \"star_alignment/\""
```

### Resource usage

| ALPS Queue Name | CPU Time    | Max Memory | Duration            |
| --------------- | ----------- | ---------- | ------------------- |
| 16G             | 157.56 sec. | 4 GB       | 1 minute 33 seconds |

After BAM is generated, create index

`samtools index star_alignment/Aligned.sortedByCoord.out.bam`

### List files

Use `ls -la` to list the files in your Trinity folder

```
drwxrwxr-x 14 ycl6 ycl6   4096 Oct 27 12:52 .
drwxrwxr-x  6 ycl6 ycl6   4096 Nov  1 13:44 ..
drwxrwxr-x  2 ycl6 ycl6   4096 Oct 27 11:58 GENOME_data
drwxrwxr-x  3 ycl6 ycl6   4096 Oct 27 10:55 gmap_spo
drwxrwxr-x  2 ycl6 ycl6   4096 Oct 27 12:19 RNASEQ_data
drwxrwxr-x  4 ycl6 ycl6   4096 Oct 27 12:36 star_alignment
drwxrwxr-x  2 ycl6 ycl6   4096 Oct 27 12:32 star_spo
-rw-rw-r-- 1 ycl6 ycl6 132734 Oct 27 12:25 trinity_gmap.bam
-rw-rw-r-- 1 ycl6 ycl6   9872 Oct 27 12:27 trinity_gmap.bam.bai
-rw-rw-r-- 1 ycl6 ycl6 454354 Oct 27 12:23 trinity_gmap.sam
drwxrwxr-x 4 ycl6 ycl6   4096 Oct 27 12:41 trinity_reference
-rw-rw-r-- 1 ycl6 ycl6  21181 Oct 27 12:22 trinity_reference.log
```

## Visualize data using IGV

The [Integrative Genomics Viewer (IGV)](http://software.broadinstitute.org/software/igv/) is a Java-based visualization tool for interactive exploration of large, integrated genomic datasets.

### Download data file to your computer

Please use WinSCP or similar sFTP clients to download the following files from your off-site server such as **ALPS1** to local PC or laptop:

```
GENOME_data/Schizosaccharomyces_pombe.ASM294v2.dna.toplevel.fa
GENOME_data/Schizosaccharomyces_pombe.ASM294v2.dna.toplevel.fa.fai
GENOME_data/Schizosaccharomyces_pombe.ASM294v2.33.gff3
trinity_gmap.bam
trinity_gmap.bam.bai
star_alignment/Aligned.sortedByCoord.out.bam
star_alignment/Aligned.sortedByCoord.out.bam.bai
```

### Launch IGV via Java Web Start

Go to the IGV downloads page: <http://software.broadinstitute.org/software/igv/download>. When prompted, register or log in as requested. Click the **launch** icon to initiate the web start launch window.

![Install IGV](https://2175465529-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-M44VuvT-GOxd2-xMygM%2F-M44Vv9Erqxz_QwrCvEL%2F-M44VzRNSKNOPPFwSCqR%2Figv1.png?generation=1586009223633005\&alt=media)

### After launching IGV

1. Go to **Genomes** -> **Load Genome from File**, and select *Schizosaccharomyces\_pombe.ASM294v2.dna.toplevel.fa*. Click Open.
2. Go to **File** -> **Load from File**, and select *Aligned.sortedByCoord.out.bam*, *trinity\_gmap.bam* and *Schizosaccharomyces\_pombe.ASM294v2.33.gff3*. Click Open.

Change the chromosome range to `I:577,956-583,212`.

In the window showing **Aligned.sortedByCoord.out.bam**, you can right click > Color alignments by > read strand, to change reads to Blue and Red colors for your paired-end reads.

![IGV in action](https://2175465529-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-M44VuvT-GOxd2-xMygM%2F-M44Vv9Erqxz_QwrCvEL%2F-M44VzRPjRfwpR3QC_va%2Figv2.png?generation=1586009224630604\&alt=media)
