We will not run through the commands in this section LIVE on ALPS1 in this class, because (1) there is not enough time and (2) each user account has only 10 Gb. However, in theory, you can run the whole thing on /work3 when you create your folder in it. Please try at your own time before your test account expires. You can also try the whole tutorial on your own Linux system with the required softwares installed.
Also, if you are using ALPS1, remember to use the bsub command to submit your jobs to the LSF system!
Genome sequence and annotation
We download the human genome FASTA sequences and annotation GTF file from the Ensembl FTP.
Filename
Size
Homo_sapiens.GRCh38.86.gtf.gz
44M
Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz
841M
cd ~/
mkdir LSLNGS2015
mkdir LSLNGS2015/GENOME_data LSLNGS2015/RNASEQ_data
cd ~/LSLNGS2015/GENOME_data
wget http://ftp.ensembl.org/pub/release-86/gtf/homo_sapiens/Homo_sapiens.GRCh38.86.gtf.gz
gunzip Homo_sapiens.GRCh38.86.gtf.gz
wget http://ftp.ensembl.org/pub/release-86/fasta/homo_sapiens/dna/Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz
gunzip Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz
Create annotations in BED format
Below, we use a combination of commands to convert annotations recorded in the GTF file into BED format.
Next, we download the RNA-Seq data of two adult female cell lines, GM12878 (ENCSR000AEC) and K562 (ENCSR000AEM), from the ENCODE website. The experiment were performed with 2 replicates and they are stranded PE101 Illumina Hi-Seq RNA-Seq libraries from rRNA-depleted Poly-A+ RNA.