DTU analysis using DEXSeq
Load required libraries
library(data.table)
library(GenomicFeatures)
library(tximport)
library(DEXSeq)
library(stageR)
library(reshape2)
library(ggplot2)
library(ggbeeswarm)Import input data
Load clinical information and create a simple sample data.frame samps
dir = "../../"
sampleData = paste0(dir, "clinical.txt")
sampleData = fread(sampleData)
rownames(sampleData) = sampleData$ENA_RUN
sampleData$individual = as.factor(sampleData$individual)
sampleData$paris_classification = as.factor(sampleData$paris_classification)
# Use relevel() to set adjacent normal samples as reference
sampleData$paris_classification = relevel(sampleData$paris_classification, ref = "normal")
samps = data.frame(sample_id = sampleData$ENA_RUN, group = sampleData$paris_classification)Load TxDB and construct a two column data.frame txdf with the transcript and gene identifiers
Define file path and import Salmon NumReads values with tximport which contains the transcript-level counts (txIn = TRUE), abundances and average transcript lengths, then generate transcript-level estimated counts (txOut = TRUE) using the dtuScaledTPM method
Buld a data.frame cts containing the estimated counts and remove rows that has zero counts across all samples
The experiments have between 17.6 and 36.2 million paired-end reads that were mapped to the transcriptome using Salmon
Pre-filter with DRIMSeq dmFilter
Create a data.frame counts containing the gene_id (Gene ID) and feature_id (Transcript ID) information from subsetted txdf and estimated counts from cts
Then create a dmDSdata object d
We will filter the object using dmFilter before running DEXSeq workflow
DEXSeq procedure
Create a count table from the filtered d and build a DEXSeqDataSet from countData, providing the sample information, a design formula, transcript ID and gene ID
Performs DEXSeq analysis using the following 3 functions: (1) estimation of size factors, (2) estimation of dispersion, then (3) perform a likelihood ratio test for differential exon usage (in this case, transcripts). The results tables (log2 fold changes and p-values) can be generated using the DEXSeqResults function
Compute per-gene adjusted p-value using perGeneQValue, which aggregates evidence from multiple tests within a gene to a single p-value for the gene and then corrects for multiple testing across genes. Construct gene-level and transcript-level statistics
DEXSeq test identified 92 genes showing evidence of isoform switching involving 186 transcripts
stageR procedure
Set up a function to strip away the version numbers in the Ensembl gene and transcript IDs
Construct a vector of per-gene p-values for the screening stage. Per-gene adjusted p-value qval is used here
Construct a one column matrix of the per-transcript confirmation p-values
The dxr.t is used twice to construct a 4-column data.frame that contain both original IDs and IDs without version numbers
Construct a stageRTx object and perform the stageR analysis. We specify alpha to be 0.05
The data.frame dex.padj summarizes the adjusted p-values from the two-stage analysis. Only genes that passed the filter are included in the table. There are 92 screened genes in this dataset, and 149 transcripts pass the confirmation stage on a target 5% overall false discovery rate (OFDR). This means that, in expectation, no more than 5% of the genes that pass screening will either (1) not contain any DTU, so be falsely screened genes, or (2) contain a falsely confirmed transcript.
Exporting results
We construct a dexData data.frame to store per-group normalised mean and normalised counts of all samples, and a dexDTU data.frame to store the DEXSeq+stageR results. We merge both data.frame with a annoData data.frame that contain per-transcript annotation derived from GENCODE GTF file before exporting the results in tabular format
Write results to files
Exploring results
We will use the plotExpression to plot the transcript expression of a given gene. In the example below, we select a gene from the data.frame dex.padj that has smallest transcript and gene adjusted p-values, and with data from dex.norm

Session info
Last updated
Was this helpful?