Per-sample 2-pass mapping is enabled with --twopassMode Basic and the --sjdbOverhang option is set to 150 (the same value used to generate genome index here)
Alignment is run with 6 threads --runThreadN 6
The --quantMode TranscriptomeSAM option generates alignments translated into transcript coordinates in the Aligned.toTranscriptome.out.bam file necessary for downstream quantification using the Salmon/RSEM workflow
The --quantMode GeneCounts option allows STAR to count the number reads per gene while mapping and outputs the ReadsPerGene.out.tab count table
With --quantMode TranscriptomeSAM GeneCounts, STAR produces both the Aligned.toTranscriptome.out.bam and ReadsPerGene.out.tab outputs
The --quantTranscriptomeBan IndelSoftclipSingleend option (default) satisfies RSEM requirements, i.e. soft-clipping or indels are not allowed. However, it can be changed to--quantTranscriptomeBan Singleend when using other quantification software such as Salmon and eXpress.
Detailed descriptions of all the parameters and options are available in the STARmanual.pdf
$cd/home/USER/SSAPs$mkdirstar$declare-arunname=("ERR2675454""ERR2675455""ERR2675458""ERR2675459""ERR2675460""ERR2675461""ERR2675464""ERR2675465""ERR2675468""ERR2675469""ERR2675472""ERR2675473""ERR2675476""ERR2675477""ERR2675478""ERR2675479""ERR2675480""ERR2675481""ERR2675484""ERR2675485")# Modified from ENCODE-DCC/rna-seq-pipeline; https://github.com/ENCODE-DCC/rna-seq-pipelinefor id in ${runname[@]}; do trim1=trimmed/${id}_1.fastq.gz trim2=trimmed/${id}_2.fastq.gz out=star/${id}_STAR--genomeDir/home/USER/db/refanno/gencode.v33_star-2.7.3a \--readFilesIn $trim1 $trim2 \--outFileNamePrefix $out \--readFilesCommandzcat \--runThreadN6 \--genomeLoadNoSharedMemory \--twopassModeBasic \--sjdbGTFfile/home/USER/db/refanno/gencode.v33.annotation.gtf \--sjdbScore2 \--sjdbOverhang150 \--limitSjdbInsertNsj1000000 \--outFilterMultimapNmax20 \--alignSJoverhangMin8 \--alignSJDBoverhangMin1 \--outFilterMismatchNmax999 \--outFilterMismatchNoverReadLmax0.04 \--alignIntronMin20 \--alignIntronMax1000000 \--alignMatesGapMax1000000 \--outSAMunmappedWithin \--outFilterTypeBySJout \--outSAMattributesNHHIASNMMD \--outSAMtypeBAMSortedByCoordinate \--quantModeTranscriptomeSAMGeneCounts \--quantTranscriptomeBanIndelSoftclipSingleend \--limitBAMsortRAM32000000000donerm-Rstar/*_STARgenomestar/*_STARpass1
The percentaged of uniquely mapped reads for ERR2675454 is around 89.5%, with multi-mappers contributing to about 8.6% of the alignment
With --quantMode GeneCounts, STAR outputs read counts per gene into ReadsPerGene.out.tab file with 4 columns which correspond to different strandedness options:
column 1: gene ID
column 2: counts for unstranded RNA-seq (htseq-count option --stranded no)
column 3: counts for the 1st read strand aligned with RNA (htseq-count option --stranded yes)
column 4: counts for the 2nd read strand aligned with RNA (htseq-count option --stranded reverse)
Alignment with STAR to the target genome, followed by quantification using Salmon with 6 threads
$cd/home/USER/SSAPs$declare-arunname=("ERR2675454""ERR2675455""ERR2675458""ERR2675459""ERR2675460""ERR2675461""ERR2675464""ERR2675465""ERR2675468""ERR2675469""ERR2675472""ERR2675473""ERR2675476""ERR2675477""ERR2675478""ERR2675479""ERR2675480""ERR2675481""ERR2675484""ERR2675485")for id in ${runname[@]}; do bam=star/${id}_Aligned.toTranscriptome.out.bamsalmonquant--threads6 \--targets/home/USER/db/refanno/gencode.v33.transcripts.fa \--gencode \--libTypeISR \--outputsalmon_star/$id \--alignments $bamdone
Alignment with STAR to the target genome, followed by quantification using RSEM with 6 threads. The --strandedness reverse was used to allow RSEM to quantify with the correct strandedness setting
$cd/home/USER/SSAPs$mkdirrsem_star$declare-arunname=("ERR2675454""ERR2675455""ERR2675458""ERR2675459""ERR2675460""ERR2675461""ERR2675464""ERR2675465""ERR2675468""ERR2675469""ERR2675472""ERR2675473""ERR2675476""ERR2675477""ERR2675478""ERR2675479""ERR2675480""ERR2675481""ERR2675484""ERR2675485")for id in ${runname[@]}; do bam=star/${id}_Aligned.toTranscriptome.out.bam out=rsem_star/${id} log=rsem_star/${id}.logrsem-calculate-expression-p6--paired-end--alignments \--estimate-rspd \--calc-ci \--ci-memory32000 \--seed123456 \--no-bam-output \--strandednessreverse \ $bam /home/USER/db/refanno/gencode.v33_rsem-1.3.3 $out > $logdone
# RSEM parameters--paired-end-inputreadsarepaired-endreads--alignments-inputfilecontainsalignmentsinSAM/BAM/CRAMformat--estimate-rspd-estimatethereadstartpositiondistributionfromdata--calc-ci-calculate95%credibilityintervals (CI) and posterior mean estimates (PME)--ci-memory32000-maximummemory (MB) of the auxiliary buffer used for computing CI--seed123456-settheseedfortherandomnumbergeneratorsusedincalculatingPMEandCI--no-bam-output-donotoutputanyBAMfile--strandednessreverse-definesthestrandednessoftheRNA-Seqreads
cut -f1-8 rsem_star/ERR2675454.genes.results | head