# Visualization

We can use STAR `--runMode inputAlignmentsFromBAM` to generate the RNA-Seq "signal" files using the coordinate-sorted BAM file (`Aligned.sortedByCoord.out.bam`). It produces the number of reads mapped to each genomic position. The output file (in bedGraph format) can be visualized in genome browsers such as the [UCSC genomic browser](http://genome.ucsc.edu/) or [IGV browser](http://software.broadinstitute.org/software/igv/).

### Execute

```
cd ~/LSLNGS2015/

STAR --runMode inputAlignmentsFromBAM --runThreadN 6 \ 
--inputBAMfile RNASEQ_data/star_GM12878_rep1/Aligned.sortedByCoord.out.bam --outWigType bedGraph \ 
--outWigStrand Stranded --outWigNorm RPM --outFileNamePrefix RNASEQ_data/GM1287_rep1_

STAR --runMode inputAlignmentsFromBAM --runThreadN 6 \ 
--inputBAMfile RNASEQ_data/star_GM12878_rep2/Aligned.sortedByCoord.out.bam --outWigType bedGraph \ 
--outWigStrand Stranded --outWigNorm RPM --outFileNamePrefix RNASEQ_data/GM1287_rep2_
```

### Resource usage

| Sample  | ALPS Queue Name | CPU Time     | Max Memory | Duration             |
| ------- | --------------- | ------------ | ---------- | -------------------- |
| GM12878 | 16G             | 2223.97 sec. | 7 GB       | 37 minutes 8 seconds |
| K562    | 16G             | 2703.06 sec. | 7 Gb       | 45 minutes 8 seconds |

### Options

`--runThreadN` number of threads to run STAR

`--inputBAMfile` path to coordinate-sorted BAM file

`--outWigType` type of signal output (bedGraph or wiggle)

`--outWigStrand` strandedness of wiggle/bedGraph output (Stranded or Unstranded)

`--outWigNorm` type of normalization for the signal (RPM or none)

`--outFileNamePrefix` output files name prefix

### Take a look at the generated signal files

`ls -la ~/LSLNGS2015/RNASEQ_data/*_Signal.*.bg`

```
-rw------- 1 u00ycl06 u00ycm02 813667524 2015-11-16 00:03 /work3/LSLNGS2015/RNASEQ_data/GM1287_rep1_Signal.UniqueMultiple.str1.out.bg
-rw------- 1 u00ycl06 u00ycm02 848273078 2015-11-16 00:03 /work3/LSLNGS2015/RNASEQ_data/GM1287_rep1_Signal.UniqueMultiple.str2.out.bg
-rw------- 1 u00ycl06 u00ycm02 778409908 2015-11-16 00:03 /work3/LSLNGS2015/RNASEQ_data/GM1287_rep1_Signal.Unique.str1.out.bg
-rw------- 1 u00ycl06 u00ycm02 817934615 2015-11-16 00:03 /work3/LSLNGS2015/RNASEQ_data/GM1287_rep1_Signal.Unique.str2.out.bg
-rw------- 1 u00ycl06 u00ycm02 787034556 2015-11-16 00:20 /work3/LSLNGS2015/RNASEQ_data/GM1287_rep2_Signal.UniqueMultiple.str1.out.bg
-rw------- 1 u00ycl06 u00ycm02 820179772 2015-11-16 00:20 /work3/LSLNGS2015/RNASEQ_data/GM1287_rep2_Signal.UniqueMultiple.str2.out.bg
-rw------- 1 u00ycl06 u00ycm02 753637183 2015-11-16 00:20 /work3/LSLNGS2015/RNASEQ_data/GM1287_rep2_Signal.Unique.str1.out.bg
-rw------- 1 u00ycl06 u00ycm02 791210099 2015-11-16 00:20 /work3/LSLNGS2015/RNASEQ_data/GM1287_rep2_Signal.Unique.str2.out.bg
-rw------- 1 u00ycl06 u00ycm02 315665648 2015-11-16 00:04 /work3/LSLNGS2015/RNASEQ_data/K562_rep1_Signal.UniqueMultiple.str1.out.bg
-rw------- 1 u00ycl06 u00ycm02 319934522 2015-11-16 00:04 /work3/LSLNGS2015/RNASEQ_data/K562_rep1_Signal.UniqueMultiple.str2.out.bg
-rw------- 1 u00ycl06 u00ycm02 295364571 2015-11-16 00:04 /work3/LSLNGS2015/RNASEQ_data/K562_rep1_Signal.Unique.str1.out.bg
-rw------- 1 u00ycl06 u00ycm02 300931098 2015-11-16 00:04 /work3/LSLNGS2015/RNASEQ_data/K562_rep1_Signal.Unique.str2.out.bg
-rw------- 1 u00ycl06 u00ycm02 444649222 2015-11-16 00:28 /work3/LSLNGS2015/RNASEQ_data/K562_rep2_Signal.UniqueMultiple.str1.out.bg
-rw------- 1 u00ycl06 u00ycm02 450522698 2015-11-16 00:28 /work3/LSLNGS2015/RNASEQ_data/K562_rep2_Signal.UniqueMultiple.str2.out.bg
-rw------- 1 u00ycl06 u00ycm02 415331427 2015-11-16 00:28 /work3/LSLNGS2015/RNASEQ_data/K562_rep2_Signal.Unique.str1.out.bg
-rw------- 1 u00ycl06 u00ycm02 423833653 2015-11-16 00:28 /work3/LSLNGS2015/RNASEQ_data/K562_rep2_Signal.Unique.str2.out.bg
```

## How to visualize on genome browsers

You can load bedGraph into IGV. Alternatively, you can also convert the bedGraph to bigWig using [bedGraphToBigWig](https://www.encodeproject.org/software/bedgraphtobigwig/) and load bigWig to IGV or UCSC Genome Browser.

### bedGraph to bigWig

Create the chrom.sizes file from FASTA sequence

`samtools faidx Homo_sapiens.GRCh38.dna.primary_assembly.fa`

Using `bedGraphToBigWig` to perform conversion

`bedGraphToBigWig in.bedGraph chrom.sizes out.bw`

Unfortunately, the bedGraph might not be sorted properly. If you encounter error when executing `bedGraphToBigWig`, such as:

> GM1287\_rep1\_Signal.Unique.str1.out.bg is not case-sensitive sorted at line 28093639.\
> Please use "sort -k1,1 -k2,2n" with LC\_COLLATE=C, or bedSort and try again.

You can use `sort` to sort the file, such as:

`LC_COLLATE=C sort -k1,1 -k2,2n -o GM1287_rep1_Signal.Unique.str1.out.bg GM1287_rep1_Signal.Unique.str1.out.bg`

Then re-run the conversion

```
bedGraphToBigWig GM1287_rep1_Signal.Unique.str1.out.bg \ 
../GENOME_data/Homo_sapiens.GRCh38.dna.primary_assembly.fa.fai \ 
GM1287_rep1_Signal.Unique.str1.out.bw
```

### View in UCSC Genome Browser (Requires access to webserver) *a bug regarding chr name*

First, you need to upload/place the bigWig files at a location accessible from the World Wide Web, i.e. on a web server, for UCSC Genome Browser to use and display it.

To view the bigWig as custom tracks, we create a text file (e.g. mytracks.txt) with following information.

```
track type=bigWig name=GM1287_rep1_str1 smoothingWindow=4 color=255,0,0 autoScale=on viewLimits=1:200 visibility=full windowingFunction=maximum bigDataUrl=https://domain_name_or_ip/folder_name/GM1287_rep1_Signal.UniqueMultiple.str1.out.bw

track type=bigWig name=GM1287_rep1_str2 smoothingWindow=4 color=0,0,255 autoScale=on viewLimits=1:200 visibility=full windowingFunction=maximum bigDataUrl=https://domain_name_or_ip/folder_name/GM1287_rep1_Signal.UniqueMultiple.str2.out.bw
```

Then, in your web browser, use the below URL to call your file:

`https://genome.ucsc.edu/cgi-bin/hgTracks?db=GRCh38&position=chr6:122657583-122663796&hgct_customText=https://domain_name_or_ip/folder_name/mytracks.txt`
