De novo assembly of RNA-Seq reads

De novo assembly of reads

First, we combine the raw reads from different conditions into a single FASTQ file (for each end) and use Trinity to generate a reference assembly.

* Assume the fq.gz files were already placed in the Trinity/RNASEQ_data folder

mkdir ~/LSLNGS2015/Trinity
mkdir ~/LSLNGS2015/Trinity/RNASEQ_data/

Execute

cd ~/LSLNGS2015/Trinity

zcat RNASEQ_data/*.left.fq.gz | gzip > RNASEQ_data/sp.left.fq.gz
zcat RNASEQ_data/*.right.fq.gz | gzip > RNASEQ_data/sp.right.fq.gz

bsub -q 16G -o ./trinity_reference.std -e ./trinity_reference.err -J Trinity \
"Trinity --seqType fq --SS_lib_type RF \
--left RNASEQ_data/sp.left.fq.gz --right RNASEQ_data/sp.right.fq.gz \
--CPU 10 --max_memory 16G --output trinity_reference >& trinity_reference.log"

You will get a message like Job <xxxxxx> is submitted to queue <16G>. to let you know your submission is successful.

You can use bjobs to check your job status.

Resource usage

ALPS Queue Name

CPU Time

Max Memory

Duration

16G

2097.24 sec.

2 GB

6 minutes 37 seconds

The assembled transcripts can be found at trinity_reference/Trinity.fasta.

Examine assembly stats

The TrinityStats.pl script will report the basic statistics about the assembly produced by Trinity. The numbers may vary slightly, as the assembly results are not deterministic.

Execute

Locate util/TrinityStats.pl in the trinityrnaseq-2.2.0 distribution, and run

PATH_TO_TRINITY/util/TrinityStats.pl trinity_reference/Trinity.fasta

Output

Last updated

Was this helpful?