QC & trimming
QC
The trimming process is run with 2 threads (-t 2
) and took about 1.3 hours to complete. Results are placed in the fastqc
folder
$ cd /home/USER/SSAPs
$ mkdir fastqc
$ declare -a runname=("ERR2675454" "ERR2675455" "ERR2675458" "ERR2675459" "ERR2675460" "ERR2675461" "ERR2675464" "ERR2675465" "ERR2675468" "ERR2675469" "ERR2675472" "ERR2675473" "ERR2675476" "ERR2675477" "ERR2675478" "ERR2675479" "ERR2675480" "ERR2675481" "ERR2675484" "ERR2675485")
for id in ${runname[@]}; do
fq1=fastqs/${id}_1.fastq.gz
fq2=fastqs/${id}_2.fastq.gz
fastqc -t 2 --extract -o fastqc $fq1 $fq2
done
Results can be view by opening the *.html
files in web browser or summary.txt
andfastqc_data.txt
in the output folders
fastqc/ERR2675454_1_fastqc/summary.txt
PASS Basic Statistics ERR2675454_1.fastq.gz
PASS Per base sequence quality ERR2675454_1.fastq.gz
PASS Per tile sequence quality ERR2675454_1.fastq.gz
PASS Per sequence quality scores ERR2675454_1.fastq.gz
WARN Per base sequence content ERR2675454_1.fastq.gz
PASS Per sequence GC content ERR2675454_1.fastq.gz
PASS Per base N content ERR2675454_1.fastq.gz
PASS Sequence Length Distribution ERR2675454_1.fastq.gz
FAIL Sequence Duplication Levels ERR2675454_1.fastq.gz
PASS Overrepresented sequences ERR2675454_1.fastq.gz
FAIL Adapter Content ERR2675454_1.fastq.gz
fastqc/ERR2675454_1_fastqc/fastqc_data.txt
##FastQC 0.11.9
>>Basic Statistics pass
#Measure Value
Filename ERR2675454_1.fastq.gz
File type Conventional base calls
Encoding Sanger / Illumina 1.9
Total Sequences 30273560
Sequences flagged as poor quality 0
Sequence length 151
%GC 47
>>END_MODULE
Per base sequence quality of ERR2675454_1.fastq.gz
Adapter removal and trimming
The trimming process is run with 6 threads (threads=6
) and took about 1.6 hours to complete.
$ mkdir trimmed
for id in ${runname[@]}; do
adapters=/home/USER/tools/bbmap/resources/adapters.fa
fq1=fastqs/${id}_1.fastq.gz
fq2=fastqs/${id}_2.fastq.gz
trim1=trimmed/${id}_1.fastq.gz
trim2=trimmed/${id}_2.fastq.gz
log=trimmed/${id}.log
bbduk.sh threads=6 in1=$fq1 in2=$fq2 out1=$trim1 out2=$trim2 \
ref=$adapters tbo tpe ktrim=r k=21 mink=9 hdist=1 \
qtrim=rl trimq=15 minlength=36 maxns=1 2> $log
done
# BBDuk parameters
tbo - trim adapters based on pair overlap detection using BBMerge
tpe - trim both reads to the same length
ktrim=r - once a reference kmer is matched in a read, that kmer and all the bases to the right will be trimmed, leaving only the bases to the left
k=21 - Kmer length used for finding contaminants
mink=9 - look for shorter kmers at read tips down to 9
hdist=1 - maximum Hamming distance for ref kmers
qtrim=rl trimq=15 - quality-trim to Q15 using the Phred algorithm for both sides
minlength=36 - discard reads shorter than 36 bp after trimming
maxns=1 - discard reads with more Ns than 1 after trimming
$ cd trimmed
$ ls
ERR2675454_1.fastq.gz ERR2675461_1.fastq.gz ERR2675472_1.fastq.gz ERR2675479_1.fastq.gz
ERR2675454_2.fastq.gz ERR2675461_2.fastq.gz ERR2675472_2.fastq.gz ERR2675479_2.fastq.gz
ERR2675454.log ERR2675461.log ERR2675472.log ERR2675479.log
ERR2675455_1.fastq.gz ERR2675464_1.fastq.gz ERR2675473_1.fastq.gz ERR2675480_1.fastq.gz
ERR2675455_2.fastq.gz ERR2675464_2.fastq.gz ERR2675473_2.fastq.gz ERR2675480_2.fastq.gz
ERR2675455.log ERR2675464.log ERR2675473.log ERR2675480.log
ERR2675458_1.fastq.gz ERR2675465_1.fastq.gz ERR2675476_1.fastq.gz ERR2675481_1.fastq.gz
ERR2675458_2.fastq.gz ERR2675465_2.fastq.gz ERR2675476_2.fastq.gz ERR2675481_2.fastq.gz
ERR2675458.log ERR2675465.log ERR2675476.log ERR2675481.log
ERR2675459_1.fastq.gz ERR2675468_1.fastq.gz ERR2675477_1.fastq.gz ERR2675484_1.fastq.gz
ERR2675459_2.fastq.gz ERR2675468_2.fastq.gz ERR2675477_2.fastq.gz ERR2675484_2.fastq.gz
ERR2675459.log ERR2675468.log ERR2675477.log ERR2675484.log
ERR2675460_1.fastq.gz ERR2675469_1.fastq.gz ERR2675478_1.fastq.gz ERR2675485_1.fastq.gz
ERR2675460_2.fastq.gz ERR2675469_2.fastq.gz ERR2675478_2.fastq.gz ERR2675485_2.fastq.gz
ERR2675460.log ERR2675469.log ERR2675478.log ERR2675485.log
Generated log files contain information about the number of reads and bases removed and passed the trimming processing
trimmed/ERR2675454.log
Input: 60547120 reads 9142615120 bases.
QTrimmed: 19033902 reads (31.44%) 366761527 bases (4.01%)
KTrimmed: 26320518 reads (43.47%) 982152960 bases (10.74%)
Trimmed by overlap: 3581100 reads (5.91%) 18700662 bases (0.20%)
Low quality discards: 16230 reads (0.03%) 2046404 bases (0.02%)
Total Removed: 831678 reads (1.37%) 1369661553 bases (14.98%)
Result: 59715442 reads (98.63%) 7772953567 bases (85.02%)
Time: 300.984 seconds.
Reads Processed: 60547k 201.16k reads/sec
Bases Processed: 9142m 30.38m bases/sec
Last updated