Alignment-based method
STAR alignment
Per-sample 2-pass mapping is enabled with
--twopassMode Basicand the--sjdbOverhangoption is set to 150 (the same value used to generate genome index here)Alignment is run with 6 threads
--runThreadN 6The
--quantMode TranscriptomeSAMoption generates alignments translated into transcript coordinates in theAligned.toTranscriptome.out.bamfile necessary for downstream quantification using the Salmon/RSEM workflowThe
--quantMode GeneCountsoption allows STAR to count the number reads per gene while mapping and outputs theReadsPerGene.out.tabcount tableWith
--quantMode TranscriptomeSAM GeneCounts, STAR produces both theAligned.toTranscriptome.out.bamandReadsPerGene.out.taboutputsThe
--quantTranscriptomeBan IndelSoftclipSingleendoption (default) satisfies RSEM requirements, i.e. soft-clipping or indels are not allowed. However, it can be changed to--quantTranscriptomeBan Singleendwhen using other quantification software such as Salmon and eXpress.Detailed descriptions of all the parameters and options are available in the STARmanual.pdf
$ cd /home/USER/SSAPs
$ mkdir star
$ declare -a runname=("ERR2675454" "ERR2675455" "ERR2675458" "ERR2675459" "ERR2675460" "ERR2675461" "ERR2675464" "ERR2675465" "ERR2675468" "ERR2675469" "ERR2675472" "ERR2675473" "ERR2675476" "ERR2675477" "ERR2675478" "ERR2675479" "ERR2675480" "ERR2675481" "ERR2675484" "ERR2675485")
# Modified from ENCODE-DCC/rna-seq-pipeline; https://github.com/ENCODE-DCC/rna-seq-pipeline
for id in ${runname[@]}; do
trim1=trimmed/${id}_1.fastq.gz
trim2=trimmed/${id}_2.fastq.gz
out=star/${id}_
STAR --genomeDir /home/USER/db/refanno/gencode.v33_star-2.7.3a \
--readFilesIn $trim1 $trim2 \
--outFileNamePrefix $out \
--readFilesCommand zcat \
--runThreadN 6 \
--genomeLoad NoSharedMemory \
--twopassMode Basic \
--sjdbGTFfile /home/USER/db/refanno/gencode.v33.annotation.gtf \
--sjdbScore 2 \
--sjdbOverhang 150 \
--limitSjdbInsertNsj 1000000 \
--outFilterMultimapNmax 20 \
--alignSJoverhangMin 8 \
--alignSJDBoverhangMin 1 \
--outFilterMismatchNmax 999 \
--outFilterMismatchNoverReadLmax 0.04 \
--alignIntronMin 20 \
--alignIntronMax 1000000 \
--alignMatesGapMax 1000000 \
--outSAMunmapped Within \
--outFilterType BySJout \
--outSAMattributes NH HI AS NM MD \
--outSAMtype BAM SortedByCoordinate \
--quantMode TranscriptomeSAM GeneCounts \
--quantTranscriptomeBan IndelSoftclipSingleend \
--limitBAMsortRAM 32000000000
done
rm -R star/*_STARgenome star/*_STARpass1The percentaged of uniquely mapped reads for ERR2675454 is around 89.5%, with multi-mappers contributing to about 8.6% of the alignment
With --quantMode GeneCounts, STAR outputs read counts per gene into ReadsPerGene.out.tab file with 4 columns which correspond to different strandedness options:
column 1: gene ID
column 2: counts for unstranded RNA-seq (htseq-count option
--stranded no)column 3: counts for the 1st read strand aligned with RNA (htseq-count option
--stranded yes)column 4: counts for the 2nd read strand aligned with RNA (htseq-count option
--stranded reverse)
STAR-Salmon Quantification (alignment-based mode)
Alignment with STAR to the target genome, followed by quantification using Salmon with 6 threads
STAR-RSEM Quantification
Alignment with STAR to the target genome, followed by quantification using RSEM with 6 threads. The --strandedness reverse was used to allow RSEM to quantify with the correct strandedness setting
Last updated
Was this helpful?