Alignment-free method

Salmon

  • Quantifying in mapping-based mode. There is an alternative alignment-based mode where one can align raw reads with another mapper and supplies the alignment (BAM format) in transcript coordinates to Salmon, you can read more about this here

  • We are using --libType ISR as the sequencing libraries of this dataset were prepared following the Illumina TruSeq Stranded Total RNA protocol. Read here about choosing the appropriate libType

  • salmon quant is run with 6 threads --threads 6

$ cd /home/USER/SSAPs

$ declare -a runname=("ERR2675454" "ERR2675455" "ERR2675458" "ERR2675459" "ERR2675460" "ERR2675461" "ERR2675464" "ERR2675465" "ERR2675468" "ERR2675469" "ERR2675472" "ERR2675473" "ERR2675476" "ERR2675477" "ERR2675478" "ERR2675479" "ERR2675480" "ERR2675481" "ERR2675484" "ERR2675485")

for id in ${runname[@]}; do
        trim1=trimmed/${id}_1.fastq.gz
        trim2=trimmed/${id}_2.fastq.gz

        salmon quant --threads 6 \
        --index /home/USER/db/refanno/gencode.v33_decoys_salmon-1.2.1 \
        --libType ISR \
        --gcBias \
        --output salmon/$id \
        --mates1 $trim1 --mates2 $trim2
done
salmon/ERR2675454/quant.sf
Name	Length	EffectiveLength	TPM	NumReads
ENST00000456328.2	1657	1455.216	0.000000	0.000
ENST00000450305.2	632	468.000	0.000000	0.000
ENST00000488147.1	1351	1031.467	5.868714	134.186
ENST00000619216.1	68	9.000	0.000000	0.000
ENST00000473358.1	712	548.000	0.000000	0.000
ENST00000469289.1	535	371.000	0.000000	0.000
ENST00000607096.1	138	26.000	0.000000	0.000
ENST00000417324.1	1187	1023.000	0.000000	0.000
ENST00000461467.1	590	426.000	0.000000	0.000

Salmon with bootstrap

Inspired by kallisto, Salmon also provides the ability to compute bootstrapped abundance estimates. Such estimates can be useful for downstream (e.g. differential expression analysis) tools that can make use of such uncertainty estimates (e.g. sleuth).

Bootstrap can be enabled by passing the --numBootstraps N option and a positive integer that dictates the number of bootstrap samples to compute. The more samples computed, the better the estimates of varaiance, but the more computation (and time) required.

$ cd /home/USER/SSAPs

$ declare -a runname=("ERR2675454" "ERR2675455" "ERR2675458" "ERR2675459" "ERR2675460" "ERR2675461" "ERR2675464" "ERR2675465" "ERR2675468" "ERR2675469" "ERR2675472" "ERR2675473" "ERR2675476" "ERR2675477" "ERR2675478" "ERR2675479" "ERR2675480" "ERR2675481" "ERR2675484" "ERR2675485")

for id in ${runname[@]}; do
        trim1=trimmed/${id}_1.fastq.gz
        trim2=trimmed/${id}_2.fastq.gz

        salmon quant --threads 6 \
        --index /home/USER/db/refanno/gencode.v33_decoys_salmon-1.2.1 \
        --libType ISR \
        --gcBias \
        --numBootstraps 100 \
        --output salmon-bs/$id \
        --mates1 $trim1 --mates2 $trim2
done
$ ls salmon-bs/ERR2675454/aux_info
ambig_info.tsv    exp_gc.gz       observed_bias_3p.gz
bootstrap         fld.gz          observed_bias.gz
expected_bias.gz  meta_info.json  obs_gc.gz

$ ls salmon-bs/ERR2675454/aux_info/bootstrap/
bootstraps.gz  names.tsv.gz

Kallisto with bootstrap

  • kallisto quant is run with 6 threads -t 6 and --rf-stranded as the appropriate library type

  • Bootstrap is enabled by passing the -b N option and a positive integer that dictates the number of bootstrap samples to compute

  • Unlike Salmon, Kallisto does not create the top-level folder containing sample-specific outfiles, hence we need to create the top-level folder kallisto before running kallisto quant

$ cd /home/USER/SSAPs

$ declare -a runname=("ERR2675454" "ERR2675455" "ERR2675458" "ERR2675459" "ERR2675460" "ERR2675461" "ERR2675464" "ERR2675465" "ERR2675468" "ERR2675469" "ERR2675472" "ERR2675473" "ERR2675476" "ERR2675477" "ERR2675478" "ERR2675479" "ERR2675480" "ERR2675481" "ERR2675484" "ERR2675485")

mkdir kallisto
for id in ${runname[@]}; do
        trim1=trimmed/${id}_1.fastq.gz
        trim2=trimmed/${id}_2.fastq.gz

        kallisto quant -t 6 \
        -i /home/USER/db/refanno/gencode.v33_kallisto-0.46.2 \
        --rf-stranded -b 100 \
        -o kallisto/$id $trim1 $trim2
done
$ ls kallisto/ERR2675454
abundance.h5  abundance.tsv  run_info.json
kallisto/ERR2675454/abundance.tsv
target_id	length	eff_length	est_counts	tpm
ENST00000456328.2	1657	1493.74	3.28824	0.144435
ENST00000450305.2	632	468.87	0	0
ENST00000488147.1	1351	1187.74	67.8655	3.74895
ENST00000619216.1	68	16.6641	0	0
ENST00000473358.1	712	548.742	0	0
ENST00000469289.1	535	372.101	0	0
ENST00000607096.1	138	27.0133	0	0
ENST00000417324.1	1187	1023.74	0	0
ENST00000461467.1	590	426.87	0.5	0.0768524

Last updated