DGE and DTE analysis of Salmon/Kallisto inputs using Sleuth

DGE analysis

cd /home/USER/SSAPs/analysis/dge
R

Load required libraries

library(data.table)
library(sleuth)

Import clinical data

Load clinical information and define file path

dir = "../../"
sampleData = paste0(dir, "clinical.txt")
sampleData = fread(sampleData)
rownames(sampleData) = sampleData$ENA_RUN
sampleData$individual = as.factor(sampleData$individual)
sampleData$paris_classification = as.factor(sampleData$paris_classification)

# Use relevel() to set adjacent normal samples as reference
sampleData$paris_classification = relevel(sampleData$paris_classification, ref = "normal")

Construct t2g data.frame

Load the annoData data.frame that contain per-transcript annotation derived from GENCODE GTF file and constrct a transcript-to-gene data.frame t2g in the specified format by Sleuth

Construct s2c data.frame

Salmon

Create a sample_to_covariates data.frame s2c that contains the required information in the specified format by Sleuth

Sleuth will compare all conditions against the reference level (i.e. first level in the factor variable). Use the levels() function to check the order of the condition and userelevel() function to reassign reference level if necessary

Kallisto

Create a sample_to_covariates data.frame s2c that contains the required information in the specified format by Sleuth

Construct Sleuth object

Construct the Sleuth object so with sleuth_prep. This object will store the metadata, model to be used for differential testing, transcript-to-gene and bootstrap information

Use aggregation_column = "ens_gene" to allow summarize the data on the gene level and gene_mode = TRUE to do counts aggregation at the gene level for normalization, transformation, and downstream steps.

The default transformation of counts is natural log, i.e. log(x+0.5). We change this to log base 2 by assigning an appropriate transformation_function

Differential expression analysis

The likelihood ratio test is performed with a 3-step procedure. First, fit the full model by fit_name = "full", then performs a second fit to a “reduced” model that presumes abundances are equal in the two conditions. Finally, Sleuth will perform LRT to identify transcripts with a significantly better fit with the “full” model

The models that have been fit can be examined with the models() function.

Retrieve results of the test

Exporting results

We construct a normData data.frame to store per-group normalised mean transcripts per million (TPM) and normalised TPM of all samples, and a deData data.frame to store the Sleuth LRT results. We merge both data.frame with a annoData data.frame that contain per-transcript annotation derived from GENCODE GTF file before exporting the results in tabular format

Exploring results

Principal component plot of the samples

PCA

Plot sample heatmap using the Jensen-Shannon divergence

Lower divergence values represent samples that are more similar to each other

Plot clustered transcript heatmap

Create expression heatmap for top 40 genes with lowest q-values

Plot the normalized bootstraps across all samples

Plot expression with bootstrap variation for selected gene

Session info

DTE analysis

Load required libraries

Import clinical data

Load clinical information and define file path

Construct t2g data.frame

Load the annoData data.frame that contain per-transcript annotation derived from GENCODE GTF file and constrct a transcript-to-gene data.frame t2g in the specified format by Sleuth

Construct s2c data.frame

Salmon

Create a data.frame s2c that contains the required information in the specified format by Sleuth

Sleuth will compare all conditions against the reference level (i.e. first level in the factor variable). Use the levels() function to check the order of the condition and userelevel() function to reassign reference level if necessary

Kallisto

Create a sample_to_covariates data.frame s2c that contains the required information in the specified format by Sleuth

Construct Sleuth object

Construct the Sleuth object so with sleuth_prep. This object will store the metadata, model to be used for differential testing, transcript-to-gene and bootstrap information.

The default transformation of counts is natural log, i.e. log(x+0.5). We change this to log base 2 by assigning an appropriate transformation_function

Differential expression analysis

The likelihood ratio test is performed with a 3-step procedure. First, fit the full model by fit_name = "full", then performs a second fit to a “reduced” model that presumes abundances are equal in the two conditions. Finally, Sleuth will perform LRT to identify transcripts with a significantly better fit with the “full” model

The models that have been fit can be examined with the models() function.

Retrieve results of the test

Exporting results

We construct a normData data.frame to store per-group normalised mean transcripts per million (TPM) and normalised TPM of all samples, and a deData data.frame to store the Sleuth LRT results. We merge both data.frame with a annoData data.frame that contain per-transcript annotation derived from GENCODE GTF file before exporting the results in tabular format

Exploring results

Principal component plot of the samples

PCA

Plot sample heatmap using the Jensen-Shannon divergence

Lower divergence values represent samples that are more similar to each other

Sample heatmap

Plot clustered transcript heatmap

Create expression heatmap for top 40 transcripts with lowest q-values

Transcript heatmap

Plot the normalized bootstraps across all samples

Plot expression with bootstrap variation for selected transcript

Session info

Last updated

Was this helpful?