# Compare de novo reconstructed transcripts to reference annotations

### Download *Schizosaccharomyces pombe* genome and annotation

```
mkdir ~/LSLNGS2015/Trinity/GENOME_data
cd ~/LSLNGS2015/Trinity/GENOME_data

wget ftp://ftp.ensemblgenomes.org/pub/fungi/release-33/fasta/schizosaccharomyces_pombe/dna/Schizosaccharomyces_pombe.ASM294v2.dna.toplevel.fa.gz
wget ftp://ftp.ensemblgenomes.org/pub/fungi/release-33/gff3/schizosaccharomyces_pombe/Schizosaccharomyces_pombe.ASM294v2.33.gff3.gz
```

```
gunzip Schizosaccharomyces_pombe.ASM294v2.dna.toplevel.fa.gz
gunzip Schizosaccharomyces_pombe.ASM294v2.33.gff3.gz

samtools faidx Schizosaccharomyces_pombe.ASM294v2.dna.toplevel.fa
```

## Builds the GMAP database

### Usage

`gmap_build [options...] -d <genomename> <fasta_files>`

### Execute

```
cd ~/LSLNGS2015/Trinity

bsub -q 16G -o ./gmap_build.std -e ./gmap_build.err -J gmap_build \
"gmap_build -d gmap_spo -D . -k 13 Schizosaccharomyces_pombe.ASM294v2.dna.toplevel.fa"
```

### Resource usage

| ALPS Queue Name | CPU Time   | Max Memory | Duration            |
| --------------- | ---------- | ---------- | ------------------- |
| 16G             | 78.21 sec. | -          | 1 minute 41 seconds |

### Options

`-d` Genome name.

`-D` Destination directory.

`-k` k-mer value for genomic index (allowed: 15 or less).

## Align Trinity transcripts to genome using GMAP

### Usage

`gmap [OPTIONS...] <FASTA files...>, or cat <FASTA files...> | gmap [OPTIONS...]`

### Execute

```
cd ~/LSLNGS2015/Trinity

bsub -q 16G -o ./gmap.std -e ./gmap.err -J gmap \
"gmap -t 6 -n 0 -D . -d gmap_spo trinity_reference/Trinity.fasta -f samse > trinity_gmap.sam"
```

### Resource usage

| ALPS Queue Name | CPU Time   | Max Memory | Duration  |
| --------------- | ---------- | ---------- | --------- |
| 16G             | 11.90 sec. | -          | 7 seconds |

### Options

`-t` Number of worker threads.

`-n` Maximum number of paths to show. If set to 0, GMAP reports single alignment plus chimeric alignments.

`-D` Destination directory.

`-d` Genome database.

### Convert SAM to sorted BAM and create index

Convert SAM to BAM and sort by coordinates

```
bsub -q 16G -J sort \
"samtools sort -T trinity_gmap -o trinity_gmap.bam trinity_gmap.sam"
```

After BAM is generated, create index

`samtools index trinity_gmap.bam`

## Builds the STAR database

```
cd ~/LSLNGS2015/Trinity
mkdir star_spo

bsub -q 16G -o ./star_build.std -e ./star_build.err -J star_build \
"STAR --runThreadN 6 --runMode genomeGenerate --genomeDir star_spo \
--genomeFastaFiles GENOME_data/Schizosaccharomyces_pombe.ASM294v2.dna.toplevel.fa"
```

### Resource usage

| ALPS Queue Name | CPU Time   | Max Memory | Duration   |
| --------------- | ---------- | ---------- | ---------- |
| 16G             | 40.19 sec. | -          | 29 seconds |

## Align RNA-Seq reads to genome using STAR

```
mkdir star_alignment

bsub -q 16G -o ./star_map.std -e ./star_map.err -J star_map \
"STAR --genomeDir star_spo \
--readFilesIn RNASEQ_data/sp.left.fq.gz RNASEQ_data/sp.right.fq.gz \
--readFilesCommand zcat --outSAMtype BAM SortedByCoordinate  --outFilterMultimapNmax 10 \
--outSAMunmapped None --twopassMode Basic \
--runThreadN 6 --outFileNamePrefix \"star_alignment/\""
```

### Resource usage

| ALPS Queue Name | CPU Time    | Max Memory | Duration            |
| --------------- | ----------- | ---------- | ------------------- |
| 16G             | 157.56 sec. | 4 GB       | 1 minute 33 seconds |

After BAM is generated, create index

`samtools index star_alignment/Aligned.sortedByCoord.out.bam`

### List files

Use `ls -la` to list the files in your Trinity folder

```
drwxrwxr-x 14 ycl6 ycl6   4096 Oct 27 12:52 .
drwxrwxr-x  6 ycl6 ycl6   4096 Nov  1 13:44 ..
drwxrwxr-x  2 ycl6 ycl6   4096 Oct 27 11:58 GENOME_data
drwxrwxr-x  3 ycl6 ycl6   4096 Oct 27 10:55 gmap_spo
drwxrwxr-x  2 ycl6 ycl6   4096 Oct 27 12:19 RNASEQ_data
drwxrwxr-x  4 ycl6 ycl6   4096 Oct 27 12:36 star_alignment
drwxrwxr-x  2 ycl6 ycl6   4096 Oct 27 12:32 star_spo
-rw-rw-r-- 1 ycl6 ycl6 132734 Oct 27 12:25 trinity_gmap.bam
-rw-rw-r-- 1 ycl6 ycl6   9872 Oct 27 12:27 trinity_gmap.bam.bai
-rw-rw-r-- 1 ycl6 ycl6 454354 Oct 27 12:23 trinity_gmap.sam
drwxrwxr-x 4 ycl6 ycl6   4096 Oct 27 12:41 trinity_reference
-rw-rw-r-- 1 ycl6 ycl6  21181 Oct 27 12:22 trinity_reference.log
```

## Visualize data using IGV

The [Integrative Genomics Viewer (IGV)](http://software.broadinstitute.org/software/igv/) is a Java-based visualization tool for interactive exploration of large, integrated genomic datasets.

### Download data file to your computer

Please use WinSCP or similar sFTP clients to download the following files from your off-site server such as **ALPS1** to local PC or laptop:

```
GENOME_data/Schizosaccharomyces_pombe.ASM294v2.dna.toplevel.fa
GENOME_data/Schizosaccharomyces_pombe.ASM294v2.dna.toplevel.fa.fai
GENOME_data/Schizosaccharomyces_pombe.ASM294v2.33.gff3
trinity_gmap.bam
trinity_gmap.bam.bai
star_alignment/Aligned.sortedByCoord.out.bam
star_alignment/Aligned.sortedByCoord.out.bam.bai
```

### Launch IGV via Java Web Start

Go to the IGV downloads page: <http://software.broadinstitute.org/software/igv/download>. When prompted, register or log in as requested. Click the **launch** icon to initiate the web start launch window.

![Install IGV](/files/-M44VzRNSKNOPPFwSCqR)

### After launching IGV

1. Go to **Genomes** -> **Load Genome from File**, and select *Schizosaccharomyces\_pombe.ASM294v2.dna.toplevel.fa*. Click Open.
2. Go to **File** -> **Load from File**, and select *Aligned.sortedByCoord.out.bam*, *trinity\_gmap.bam* and *Schizosaccharomyces\_pombe.ASM294v2.33.gff3*. Click Open.

Change the chromosome range to `I:577,956-583,212`.

In the window showing **Aligned.sortedByCoord.out.bam**, you can right click > Color alignments by > read strand, to change reads to Blue and Red colors for your paired-end reads.

![IGV in action](/files/-M44VzRPjRfwpR3QC_va)


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://ycl6.gitbook.io/rna-seq-data-analysis/de_novo_assembly_using_trinity/compare_de_novo_reconstructed_transcripts_to_reference_annotations.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
