Obtain data and software

We will not run through the commands in this section LIVE on ALPS1 in this class, because (1) there is not enough time and (2) each user account has only 10 Gb. However, in theory, you can run the whole thing on /work3 when you create your folder in it. Please try at your own time before your test account expires. You can also try the whole tutorial on your own Linux system with the required softwares installed.

Also, if you are using ALPS1, remember to use the bsub command to submit your jobs to the LSF system!

ALPS1 work3 folder

Genome sequence and annotation

We download the human genome FASTA sequences and annotation GTF file from the Ensembl FTP.

Filename

Size

Homo_sapiens.GRCh38.86.gtf.gz

44M

Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz

841M

Create annotations in BED format

Below, we use a combination of commands to convert annotations recorded in the GTF file into BED format.

Use head to view the BED files

head Homo_sapiens.GRCh38.86.gene.bed

head Homo_sapiens.GRCh38.86.transcript.bed

RNA-seq raw reads

Next, we download the RNA-Seq data of two adult female cell lines, GM12878 (ENCSR000AEC) and K562 (ENCSR000AEM), from the ENCODE website. The experiment were performed with 2 replicates and they are stranded PE101 Illumina Hi-Seq RNA-Seq libraries from rRNA-depleted Poly-A+ RNA.

Filename

Sample ID

Size

GM12878.rep1.R1.fastq.gz

ENCFF001RFH

7.9G

GM12878.rep1.R2.fastq.gz

ENCFF001RFG

8.0G

GM12878.rep2.R1.fastq.gz

ENCFF001RFB

6.9G

GM12878.rep2.R2.fastq.gz

ENCFF001RFA

7.1G

K562.rep1.R1.fastq.gz

ENCFF001RED

7.2G

K562.rep1.R2.fastq.gz

ENCFF001RDZ

7.4G

K562.rep2.R1.fastq.gz

ENCFF001REG

8.8G

K562.rep2.R2.fastq.gz

ENCFF001REF

9.1G

Software

STAR (Spliced Transcripts Alignment to a Reference)

  • v2.5.2b [20 Aug 2016] - Latest version available at the time of writing and used in this exercise

  • v2.3.0e [14 Feb 2013] - Latest version available on ALPS1

RSEM (RNA-Seq by Expectation-Maximization)

  • v1.3.0 [02 Oct 2016] - Latest version available at the time of writing

  • v1.2.31 [04 Jun 2016] - Version used in this exercise

  • v1.2.19 [05 Nov 2014] - Latest version available on ALPS1

  • v1.3.1 [22 Apr 2016] - Latest version available at the time of writing and used in this exercise

  • v1.2 [02 Feb 2015] - Latest version available on ALPS1

Last updated

Was this helpful?