DGE and DTE analysis of Salmon/Kallisto inputs using Sleuth
Analysis and result presented was performed with Salmon counts, Code snippet to import Kallisto counts is also provided
DGE analysis
cd /home/USER/SSAPs/analysis/dge
R
Load required libraries
library(data.table)
library(sleuth)
Import clinical data
Load clinical information and define file path
dir = "../../"
sampleData = paste0(dir, "clinical.txt")
sampleData = fread(sampleData)
rownames(sampleData) = sampleData$ENA_RUN
sampleData$individual = as.factor(sampleData$individual)
sampleData$paris_classification = as.factor(sampleData$paris_classification)
# Use relevel() to set adjacent normal samples as reference
sampleData$paris_classification = relevel(sampleData$paris_classification, ref = "normal")
Construct t2g data.frame
Load the annoData data.frame that contain per-transcript annotation derived from GENCODE GTF file and constrct a transcript-to-gene data.frame t2g in the specified format by Sleuth
> s2c
sample condition path
ERR2675454 ERR2675454 0-IIa ../../salmon-bs/ERR2675454
ERR2675455 ERR2675455 normal ../../salmon-bs/ERR2675455
ERR2675458 ERR2675458 0-IIa ../../salmon-bs/ERR2675458
ERR2675459 ERR2675459 normal ../../salmon-bs/ERR2675459
ERR2675460 ERR2675460 0-IIa ../../salmon-bs/ERR2675460
ERR2675461 ERR2675461 normal ../../salmon-bs/ERR2675461
ERR2675464 ERR2675464 0-IIa ../../salmon-bs/ERR2675464
ERR2675465 ERR2675465 normal ../../salmon-bs/ERR2675465
ERR2675468 ERR2675468 0-IIa ../../salmon-bs/ERR2675468
ERR2675469 ERR2675469 normal ../../salmon-bs/ERR2675469
ERR2675472 ERR2675472 0-IIa ../../salmon-bs/ERR2675472
ERR2675473 ERR2675473 normal ../../salmon-bs/ERR2675473
ERR2675476 ERR2675476 0-IIa ../../salmon-bs/ERR2675476
ERR2675477 ERR2675477 normal ../../salmon-bs/ERR2675477
ERR2675478 ERR2675478 0-IIa ../../salmon-bs/ERR2675478
ERR2675479 ERR2675479 normal ../../salmon-bs/ERR2675479
ERR2675480 ERR2675480 0-IIa ../../salmon-bs/ERR2675480
ERR2675481 ERR2675481 normal ../../salmon-bs/ERR2675481
ERR2675484 ERR2675484 0-IIa ../../salmon-bs/ERR2675484
ERR2675485 ERR2675485 normal ../../salmon-bs/ERR2675485
Sleuth will compare all conditions against the reference level (i.e. first level in the factor variable). Use the levels() function to check the order of the condition and userelevel() function to reassign reference level if necessary
> s2c
sample condition path
ERR2675454 ERR2675454 0-IIa ../../kallisto/ERR2675454
ERR2675455 ERR2675455 normal ../../kallisto/ERR2675455
ERR2675458 ERR2675458 0-IIa ../../kallisto/ERR2675458
ERR2675459 ERR2675459 normal ../../kallisto/ERR2675459
ERR2675460 ERR2675460 0-IIa ../../kallisto/ERR2675460
ERR2675461 ERR2675461 normal ../../kallisto/ERR2675461
ERR2675464 ERR2675464 0-IIa ../../kallisto/ERR2675464
ERR2675465 ERR2675465 normal ../../kallisto/ERR2675465
ERR2675468 ERR2675468 0-IIa ../../kallisto/ERR2675468
ERR2675469 ERR2675469 normal ../../kallisto/ERR2675469
ERR2675472 ERR2675472 0-IIa ../../kallisto/ERR2675472
ERR2675473 ERR2675473 normal ../../kallisto/ERR2675473
ERR2675476 ERR2675476 0-IIa ../../kallisto/ERR2675476
ERR2675477 ERR2675477 normal ../../kallisto/ERR2675477
ERR2675478 ERR2675478 0-IIa ../../kallisto/ERR2675478
ERR2675479 ERR2675479 normal ../../kallisto/ERR2675479
ERR2675480 ERR2675480 0-IIa ../../kallisto/ERR2675480
ERR2675481 ERR2675481 normal ../../kallisto/ERR2675481
ERR2675484 ERR2675484 0-IIa ../../kallisto/ERR2675484
ERR2675485 ERR2675485 normal ../../kallisto/ERR2675485
Construct Sleuth object
Construct the Sleuth object so with sleuth_prep. This object will store the metadata, model to be used for differential testing, transcript-to-gene and bootstrap information
Use aggregation_column = "ens_gene" to allow summarize the data on the gene level and gene_mode = TRUE to do counts aggregation at the gene level for normalization, transformation, and downstream steps.
The default transformation of counts is natural log, i.e. log(x+0.5). We change this to log base 2 by assigning an appropriate transformation_function
reading in kallisto results
dropping unused factor levels
....................
normalizing est_counts
51524 targets passed the filter
normalizing tpm
merging in metadata
aggregating by column: ens_gene
17692 genes passed the filter
summarizing bootstraps
....................
Differential expression analysis
The likelihood ratio test is performed with a 3-step procedure. First, fit the full model by fit_name = "full", then performs a second fit to a “reduced” model that presumes abundances are equal in the two conditions. Finally, Sleuth will perform LRT to identify transcripts with a significantly better fit with the “full” model
so = sleuth_fit(so, ~condition, fit_name = "full")
so = sleuth_fit(so, ~1, fit_name = "reduced")
so = sleuth_lrt(so, null_model = "reduced", alt_model = "full")
The models that have been fit can be examined with the models() function.
> models(so)
[ full ]
formula: ~condition
data modeled: obs_counts
transform sync'ed: TRUE
coefficients:
(Intercept)
condition0-IIa
[ reduced ]
formula: ~1
data modeled: obs_counts
transform sync'ed: TRUE
coefficients:
(Intercept)
Retrieve results of the test
res = sleuth_results(so, test = "reduced:full", test_type = "lrt", show_all = TRUE)
We construct a normData data.frame to store per-group normalised mean transcripts per million (TPM) and normalised TPM of all samples, and a deData data.frame to store the Sleuth LRT results. We merge both data.frame with a annoData data.frame that contain per-transcript annotation derived from GENCODE GTF file before exporting the results in tabular format
dir = "../../"
sampleData = paste0(dir, "clinical.txt")
sampleData = fread(sampleData)
rownames(sampleData) = sampleData$ENA_RUN
sampleData$individual = as.factor(sampleData$individual)
sampleData$paris_classification = as.factor(sampleData$paris_classification)
# Use relevel() to set adjacent normal samples as reference
sampleData$paris_classification = relevel(sampleData$paris_classification, ref = "normal")
Construct t2g data.frame
Load the annoData data.frame that contain per-transcript annotation derived from GENCODE GTF file and constrct a transcript-to-gene data.frame t2g in the specified format by Sleuth
> s2c
sample condition path
ERR2675454 ERR2675454 0-IIa ../../salmon-bs/ERR2675454
ERR2675455 ERR2675455 normal ../../salmon-bs/ERR2675455
ERR2675458 ERR2675458 0-IIa ../../salmon-bs/ERR2675458
ERR2675459 ERR2675459 normal ../../salmon-bs/ERR2675459
ERR2675460 ERR2675460 0-IIa ../../salmon-bs/ERR2675460
ERR2675461 ERR2675461 normal ../../salmon-bs/ERR2675461
ERR2675464 ERR2675464 0-IIa ../../salmon-bs/ERR2675464
ERR2675465 ERR2675465 normal ../../salmon-bs/ERR2675465
ERR2675468 ERR2675468 0-IIa ../../salmon-bs/ERR2675468
ERR2675469 ERR2675469 normal ../../salmon-bs/ERR2675469
ERR2675472 ERR2675472 0-IIa ../../salmon-bs/ERR2675472
ERR2675473 ERR2675473 normal ../../salmon-bs/ERR2675473
ERR2675476 ERR2675476 0-IIa ../../salmon-bs/ERR2675476
ERR2675477 ERR2675477 normal ../../salmon-bs/ERR2675477
ERR2675478 ERR2675478 0-IIa ../../salmon-bs/ERR2675478
ERR2675479 ERR2675479 normal ../../salmon-bs/ERR2675479
ERR2675480 ERR2675480 0-IIa ../../salmon-bs/ERR2675480
ERR2675481 ERR2675481 normal ../../salmon-bs/ERR2675481
ERR2675484 ERR2675484 0-IIa ../../salmon-bs/ERR2675484
ERR2675485 ERR2675485 normal ../../salmon-bs/ERR2675485
Sleuth will compare all conditions against the reference level (i.e. first level in the factor variable). Use the levels() function to check the order of the condition and userelevel() function to reassign reference level if necessary
> s2c
sample condition path
ERR2675454 ERR2675454 0-IIa ../../kallisto/ERR2675454
ERR2675455 ERR2675455 normal ../../kallisto/ERR2675455
ERR2675458 ERR2675458 0-IIa ../../kallisto/ERR2675458
ERR2675459 ERR2675459 normal ../../kallisto/ERR2675459
ERR2675460 ERR2675460 0-IIa ../../kallisto/ERR2675460
ERR2675461 ERR2675461 normal ../../kallisto/ERR2675461
ERR2675464 ERR2675464 0-IIa ../../kallisto/ERR2675464
ERR2675465 ERR2675465 normal ../../kallisto/ERR2675465
ERR2675468 ERR2675468 0-IIa ../../kallisto/ERR2675468
ERR2675469 ERR2675469 normal ../../kallisto/ERR2675469
ERR2675472 ERR2675472 0-IIa ../../kallisto/ERR2675472
ERR2675473 ERR2675473 normal ../../kallisto/ERR2675473
ERR2675476 ERR2675476 0-IIa ../../kallisto/ERR2675476
ERR2675477 ERR2675477 normal ../../kallisto/ERR2675477
ERR2675478 ERR2675478 0-IIa ../../kallisto/ERR2675478
ERR2675479 ERR2675479 normal ../../kallisto/ERR2675479
ERR2675480 ERR2675480 0-IIa ../../kallisto/ERR2675480
ERR2675481 ERR2675481 normal ../../kallisto/ERR2675481
ERR2675484 ERR2675484 0-IIa ../../kallisto/ERR2675484
ERR2675485 ERR2675485 normal ../../kallisto/ERR2675485
Construct Sleuth object
Construct the Sleuth object so with sleuth_prep. This object will store the metadata, model to be used for differential testing, transcript-to-gene and bootstrap information.
The default transformation of counts is natural log, i.e. log(x+0.5). We change this to log base 2 by assigning an appropriate transformation_function
reading in kallisto results
dropping unused factor levels
....................
normalizing est_counts
51524 targets passed the filter
normalizing tpm
merging in metadata
summarizing bootstraps
....................
Differential expression analysis
The likelihood ratio test is performed with a 3-step procedure. First, fit the full model by fit_name = "full", then performs a second fit to a “reduced” model that presumes abundances are equal in the two conditions. Finally, Sleuth will perform LRT to identify transcripts with a significantly better fit with the “full” model
so = sleuth_fit(so, ~condition, fit_name = "full")
so = sleuth_fit(so, ~1, fit_name = "reduced")
so = sleuth_lrt(so, null_model = "reduced", alt_model = "full")
The models that have been fit can be examined with the models() function.
> models(so)
[ full ]
formula: ~condition
data modeled: obs_counts
transform sync'ed: TRUE
coefficients:
(Intercept)
condition0-IIa
[ reduced ]
formula: ~1
data modeled: obs_counts
transform sync'ed: TRUE
coefficients:
(Intercept)
Retrieve results of the test
res = sleuth_results(so, test = "reduced:full", test_type = "lrt", show_all = TRUE)
We construct a normData data.frame to store per-group normalised mean transcripts per million (TPM) and normalised TPM of all samples, and a deData data.frame to store the Sleuth LRT results. We merge both data.frame with a annoData data.frame that contain per-transcript annotation derived from GENCODE GTF file before exporting the results in tabular format