Create mapping indices
Before we can perform NGS read mapping, we will create the genome indices using the genome FASTA file as input. You can re-use these indices in all your future RNA-seq mapping. However, if you wish to map to a different genome build/assembly, you have to re-run this step using different genome sequences and save the indices in a different directory. Also, you might also need to build new indices if using a newer version of these software, always check the relavent README or CHANGELOG.
Here, we will create indices for STAR and RSEM.
STAR
Usage
STAR --runMode genomeGenerate --genomeDir path_to_genomedir --genomeFastaFiles reference_fasta_file(s)
Execute
Resource usage
ALPS Queue Name
CPU Time
Max Memory
Duration
128G
51610.02 sec.
31 GB
1 hour 15 minutes 6 seconds
Options
--runThreadN
defines the number of threads to be used for genome generation.
--runMode genomeGenerate
directs STAR to run genome indices generation job.
--genomeDir
path to the directory where the genome indices are stored. This directory has to be created (with mkdir
) before STAR run and needs to writing permissions. The file system needs to have at least 100GB of disk space available for a typical mammalian genome.
--genomeFastaFiles
one or more FASTA files with the genome reference sequences.
--sjdbGTFfile
path to the transcript annotation in the standard GTF format.
Take a look at the STAR indices generated
ls -la ~/LSLNGS2015/GENOME_data/star
RSEM
Usage
rsem-prepare-reference [options] reference_fasta_file(s) reference_name
Execute
Resource usage
ALPS Queue Name
CPU Time
Max Memory
Duration
48G
140.03 sec.
2 GB
2 minutes and 26 seconds
Options
--gtf
option specifies path to the gene annotations (in GTF format), and RSEM assumes the FASTA file contains sequence of a genome. If this option is off, RSEM will assume the FASTA file contains the reference transcripts. The name of each sequence in the Multi-FASTA files is its transcript_id.
Take a look at the RSEM indices generated
ls -la ~/LSLNGS2015/GENOME_data/rsem
Last updated
Was this helpful?