Alignment-based method

STAR alignment

  • Per-sample 2-pass mapping is enabled with --twopassMode Basic and the --sjdbOverhang option is set to 150 (the same value used to generate genome index here)

  • Alignment is run with 6 threads --runThreadN 6

  • The --quantMode TranscriptomeSAM option generates alignments translated into transcript coordinates in the Aligned.toTranscriptome.out.bam file necessary for downstream quantification using the Salmon/RSEM workflow

  • The --quantMode GeneCounts option allows STAR to count the number reads per gene while mapping and outputs the ReadsPerGene.out.tab count table

  • With --quantMode TranscriptomeSAM GeneCounts, STAR produces both the Aligned.toTranscriptome.out.bam and ReadsPerGene.out.tab outputs

  • The --quantTranscriptomeBan IndelSoftclipSingleend option (default) satisfies RSEM requirements, i.e. soft-clipping or indels are not allowed. However, it can be changed to--quantTranscriptomeBan Singleend when using other quantification software such as Salmon and eXpress.

  • Detailed descriptions of all the parameters and options are available in the STARmanual.pdf

$ cd /home/USER/SSAPs
$ mkdir star

$ declare -a runname=("ERR2675454" "ERR2675455" "ERR2675458" "ERR2675459" "ERR2675460" "ERR2675461" "ERR2675464" "ERR2675465" "ERR2675468" "ERR2675469" "ERR2675472" "ERR2675473" "ERR2675476" "ERR2675477" "ERR2675478" "ERR2675479" "ERR2675480" "ERR2675481" "ERR2675484" "ERR2675485")

# Modified from ENCODE-DCC/rna-seq-pipeline; https://github.com/ENCODE-DCC/rna-seq-pipeline
for id in ${runname[@]}; do
        trim1=trimmed/${id}_1.fastq.gz
        trim2=trimmed/${id}_2.fastq.gz
        out=star/${id}_

        STAR --genomeDir /home/USER/db/refanno/gencode.v33_star-2.7.3a \
        --readFilesIn $trim1 $trim2 \
        --outFileNamePrefix $out \
        --readFilesCommand zcat \
        --runThreadN 6 \
        --genomeLoad NoSharedMemory \
        --twopassMode Basic \
        --sjdbGTFfile /home/USER/db/refanno/gencode.v33.annotation.gtf \
        --sjdbScore 2 \
        --sjdbOverhang 150 \
        --limitSjdbInsertNsj 1000000 \
        --outFilterMultimapNmax 20 \
        --alignSJoverhangMin 8 \
        --alignSJDBoverhangMin 1 \
        --outFilterMismatchNmax 999 \
        --outFilterMismatchNoverReadLmax 0.04 \
        --alignIntronMin 20 \
        --alignIntronMax 1000000 \
        --alignMatesGapMax 1000000 \
        --outSAMunmapped Within \
        --outFilterType BySJout \
        --outSAMattributes NH HI AS NM MD \
        --outSAMtype BAM SortedByCoordinate \
        --quantMode TranscriptomeSAM GeneCounts \
        --quantTranscriptomeBan IndelSoftclipSingleend \
        --limitBAMsortRAM 32000000000
done
rm -R star/*_STARgenome star/*_STARpass1

The percentaged of uniquely mapped reads for ERR2675454 is around 89.5%, with multi-mappers contributing to about 8.6% of the alignment

With --quantMode GeneCounts, STAR outputs read counts per gene into ReadsPerGene.out.tab file with 4 columns which correspond to different strandedness options:

  • column 1: gene ID

  • column 2: counts for unstranded RNA-seq (htseq-count option --stranded no)

  • column 3: counts for the 1st read strand aligned with RNA (htseq-count option --stranded yes)

  • column 4: counts for the 2nd read strand aligned with RNA (htseq-count option --stranded reverse)

STAR-Salmon Quantification (alignment-based mode)

Alignment with STAR to the target genome, followed by quantification using Salmon with 6 threads

STAR-RSEM Quantification

Alignment with STAR to the target genome, followed by quantification using RSEM with 6 threads. The --strandedness reverse was used to allow RSEM to quantify with the correct strandedness setting

Last updated

Was this helpful?