Create mapping indices
Last updated
Was this helpful?
Last updated
Was this helpful?
Before we can perform NGS read mapping, we will create the genome indices using the genome FASTA file as input. You can re-use these indices in all your future RNA-seq mapping. However, if you wish to map to a different genome build/assembly, you have to re-run this step using different genome sequences and save the indices in a different directory. Also, you might also need to build new indices if using a newer version of these software, always check the relavent README or CHANGELOG.
Here, we will create indices for and .
STAR --runMode genomeGenerate --genomeDir path_to_genomedir --genomeFastaFiles reference_fasta_file(s)
ALPS Queue Name
CPU Time
Max Memory
Duration
128G
51610.02 sec.
31 GB
1 hour 15 minutes 6 seconds
--runThreadN
defines the number of threads to be used for genome generation.
--runMode genomeGenerate
directs STAR to run genome indices generation job.
--genomeDir
path to the directory where the genome indices are stored. This directory has to be created (with mkdir
) before STAR run and needs to writing permissions. The file system needs to have at least 100GB of disk space available for a typical mammalian genome.
--genomeFastaFiles
one or more FASTA files with the genome reference sequences.
--sjdbGTFfile
path to the transcript annotation in the standard GTF format.
ls -la ~/LSLNGS2015/GENOME_data/star
rsem-prepare-reference [options] reference_fasta_file(s) reference_name
ALPS Queue Name
CPU Time
Max Memory
Duration
48G
140.03 sec.
2 GB
2 minutes and 26 seconds
--gtf
option specifies path to the gene annotations (in GTF format), and RSEM assumes the FASTA file contains sequence of a genome. If this option is off, RSEM will assume the FASTA file contains the reference transcripts. The name of each sequence in the Multi-FASTA files is its transcript_id.
ls -la ~/LSLNGS2015/GENOME_data/rsem