Per-sample 2-pass mapping is enabled with --twopassMode Basic and the --sjdbOverhang option is set to 150 (the same value used to generate genome index here)
Alignment is run with 6 threads --runThreadN 6
The --quantMode TranscriptomeSAM option generates alignments translated into transcript coordinates in the Aligned.toTranscriptome.out.bam file necessary for downstream quantification using the Salmon/RSEM workflow
The --quantMode GeneCounts option allows STAR to count the number reads per gene while mapping and outputs the ReadsPerGene.out.tab count table
With --quantMode TranscriptomeSAM GeneCounts, STAR produces both the Aligned.toTranscriptome.out.bam and ReadsPerGene.out.tab outputs
The --quantTranscriptomeBan IndelSoftclipSingleend option (default) satisfies RSEM requirements, i.e. soft-clipping or indels are not allowed. However, it can be changed to--quantTranscriptomeBan Singleend when using other quantification software such as Salmon and eXpress.
Detailed descriptions of all the parameters and options are available in the STARmanual.pdf
The percentaged of uniquely mapped reads for ERR2675454 is around 89.5%, with multi-mappers contributing to about 8.6% of the alignment
star/ERR2675454_Log.final.out
Started job on | Apr 29 12:03:14
Started mapping on | Apr 29 12:19:15
Finished on | Apr 29 12:44:10
Mapping speed, Million of reads per hour | 71.90
Number of input reads | 29857721
Average input read length | 260
UNIQUE READS:
Uniquely mapped reads number | 26729034
Uniquely mapped reads % | 89.52%
Average mapped length | 260.28
Number of splices: Total | 23442395
Number of splices: Annotated (sjdb) | 23441965
Number of splices: GT/AG | 23204758
Number of splices: GC/AG | 196371
Number of splices: AT/AC | 21311
Number of splices: Non-canonical | 19955
Mismatch rate per base, % | 0.23%
Deletion rate per base | 0.01%
Deletion average length | 1.74
Insertion rate per base | 0.01%
Insertion average length | 1.58
MULTI-MAPPING READS:
Number of reads mapped to multiple loci | 2553618
% of reads mapped to multiple loci | 8.55%
Number of reads mapped to too many loci | 5120
% of reads mapped to too many loci | 0.02%
UNMAPPED READS:
Number of reads unmapped: too many mismatches | 0
% of reads unmapped: too many mismatches | 0.00%
Number of reads unmapped: too short | 517770
% of reads unmapped: too short | 1.73%
Number of reads unmapped: other | 52179
% of reads unmapped: other | 0.17%
CHIMERIC READS:
Number of chimeric reads | 0
% of chimeric reads | 0.00%
With --quantMode GeneCounts, STAR outputs read counts per gene into ReadsPerGene.out.tab file with 4 columns which correspond to different strandedness options:
column 1: gene ID
column 2: counts for unstranded RNA-seq (htseq-count option --stranded no)
column 3: counts for the 1st read strand aligned with RNA (htseq-count option --stranded yes)
column 4: counts for the 2nd read strand aligned with RNA (htseq-count option --stranded reverse)
Alignment with STAR to the target genome, followed by quantification using RSEM with 6 threads. The --strandedness reverse was used to allow RSEM to quantify with the correct strandedness setting
# RSEM parameters
--paired-end - input reads are paired-end reads
--alignments - input file contains alignments in SAM/BAM/CRAM format
--estimate-rspd - estimate the read start position distribution from data
--calc-ci - calculate 95% credibility intervals (CI) and posterior mean estimates (PME)
--ci-memory 32000 - maximum memory (MB) of the auxiliary buffer used for computing CI
--seed 123456 - set the seed for the random number generators used in calculating PME and CI
--no-bam-output - do not output any BAM file
--strandedness reverse - defines the strandedness of the RNA-Seq reads
cut -f1-8 rsem_star/ERR2675454.genes.results | head