De novo assembly of RNA-Seq reads

De novo assembly of reads

First, we combine the raw reads from different conditions into a single FASTQ file (for each end) and use Trinity to generate a reference assembly.

* Assume the fq.gz files were already placed in the Trinity/RNASEQ_data folder

mkdir ~/LSLNGS2015/Trinity
mkdir ~/LSLNGS2015/Trinity/RNASEQ_data/


cd ~/LSLNGS2015/Trinity

zcat RNASEQ_data/*.left.fq.gz | gzip > RNASEQ_data/sp.left.fq.gz
zcat RNASEQ_data/*.right.fq.gz | gzip > RNASEQ_data/sp.right.fq.gz

bsub -q 16G -o ./trinity_reference.std -e ./trinity_reference.err -J Trinity \
"Trinity --seqType fq --SS_lib_type RF \
--left RNASEQ_data/sp.left.fq.gz --right RNASEQ_data/sp.right.fq.gz \
--CPU 10 --max_memory 16G --output trinity_reference >& trinity_reference.log"

You will get a message like Job <xxxxxx> is submitted to queue <16G>. to let you know your submission is successful.

You can use bjobs to check your job status.

993732  s00ycm0 RUN   16G        alps1       8*alps1-40  Trinity    Nov 18 22:38

Resource usage

The assembled transcripts can be found at trinity_reference/Trinity.fasta.

Examine assembly stats

The script will report the basic statistics about the assembly produced by Trinity. The numbers may vary slightly, as the assembly results are not deterministic.


Locate util/ in the trinityrnaseq-2.2.0 distribution, and run

PATH_TO_TRINITY/util/ trinity_reference/Trinity.fasta


## Counts of transcripts, etc.
Total trinity 'genes':  377
Total trinity transcripts:      385
Percent GC: 38.65

Stats based on ALL transcript contigs:

        Contig N10: 3373
        Contig N20: 2605
        Contig N30: 2219
        Contig N40: 1936
        Contig N50: 1703

        Median contig length: 772
        Average contig: 1046.43
        Total assembled bases: 402875

## Stats based on ONLY LONGEST ISOFORM per 'GENE':

        Contig N10: 3373
        Contig N20: 2605
        Contig N30: 2219
        Contig N40: 1936
        Contig N50: 1682

        Median contig length: 765
        Average contig: 1038.27
        Total assembled bases: 391428

