# Building a TxDb object

Here is an example of how to build resources from Gene transfer format (GTF) files.

Structure is as GFF, so the fields are: `<seqname> <source> <feature> <start> <end> <score> <strand> <frame> [attributes] [comments]`

We can view the information in the GTF file we downloaded [here](/guide-to-rna-seq-analysis/preparations/softwares-and-databases.md#gencode).

{% code title="/home/USER/db/refanno/gencode.v33.annotation.gtf" %}

```bash
##description: evidence-based annotation of the human genome (GRCh38), version 33 (Ensembl 99)
##provider: GENCODE
##contact: gencode-help@ebi.ac.uk
##format: gtf
##date: 2019-12-13
chr1	HAVANA	gene	11869	14409	.	+	.	gene_id "ENSG00000223972.5"; gene_type "transcribed_unprocessed_pseudogene"; gene_name "DDX11L1"; level 2; hgnc_id "HGNC:37102"; havana_gene "OTTHUMG00000000961.2";
chr1	HAVANA	transcript	11869	14409	.	+	.	gene_id "ENSG00000223972.5"; transcript_id "ENST00000456328.2"; gene_type "transcribed_unprocessed_pseudogene"; gene_name "DDX11L1"; transcript_type "processed_transcript"; transcript_name "DDX11L1-202"; level 2; transcript_support_level "1"; hgnc_id "HGNC:37102"; tag "basic"; havana_gene "OTTHUMG00000000961.2"; havana_transcript "OTTHUMT00000362751.1";
chr1	HAVANA	exon	11869	12227	.	+	.	gene_id "ENSG00000223972.5"; transcript_id "ENST00000456328.2"; gene_type "transcribed_unprocessed_pseudogene"; gene_name "DDX11L1"; transcript_type "processed_transcript"; transcript_name "DDX11L1-202"; exon_number 1; exon_id "ENSE00002234944.1"; level 2; transcript_support_level "1"; hgnc_id "HGNC:37102"; tag "basic"; havana_gene "OTTHUMG00000000961.2"; havana_transcript "OTTHUMT00000362751.1";
chr1	HAVANA	exon	12613	12721	.	+	.	gene_id "ENSG00000223972.5"; transcript_id "ENST00000456328.2"; gene_type "transcribed_unprocessed_pseudogene"; gene_name "DDX11L1"; transcript_type "processed_transcript"; transcript_name "DDX11L1-202"; exon_number 2; exon_id "ENSE00003582793.1"; level 2; transcript_support_level "1"; hgnc_id "HGNC:37102"; tag "basic"; havana_gene "OTTHUMG00000000961.2"; havana_transcript "OTTHUMT00000362751.1";
chr1	HAVANA	exon	13221	14409	.	+	.	gene_id "ENSG00000223972.5"; transcript_id "ENST00000456328.2"; gene_type "transcribed_unprocessed_pseudogene"; gene_name "DDX11L1"; transcript_type "processed_transcript"; transcript_name "DDX11L1-202"; exon_number 3; exon_id "ENSE00002312635.1"; level 2; transcript_support_level "1"; hgnc_id "HGNC:37102"; tag "basic"; havana_gene "OTTHUMG00000000961.2"; havana_transcript "OTTHUMT00000362751.1";
```

{% endcode %}

In `/home/USER/db/refanno` we execute `R`:

```r
library(GenomicFeatures)

gtf <- "gencode.v33.annotation.gtf"
txdb.filename <- "gencode.v33.annotation.sqlite"

txdb <- makeTxDbFromGFF(gtf)

# We can use saveDb() to save the TxDb database (SQLite database) for later uses
saveDb(txdb, txdb.filename)

# We can use loadDb() to use the TxDb database
txdb <- loadDb(txdb.filename)
```

```r
> txdb
TxDb object:
# Db type: TxDb
# Supporting package: GenomicFeatures
# Data source: gencode.v33.annotation.gtf
# Organism: NA
# Taxonomy ID: NA
# miRBase build ID: NA
# Genome: NA
# transcript_nrow: 227912
# exon_nrow: 747278
# cds_nrow: 275239
# Db created by: GenomicFeatures package from Bioconductor
# Creation time: 2020-04-27 15:47:27 +0100 (Mon, 27 Apr 2020)
# GenomicFeatures version at creation time: 1.38.2
# RSQLite version at creation time: 2.2.0
# DBSCHEMAVERSION: 1.2
```

```r
> genes(txdb)
GRanges object with 60662 ranges and 1 metadata column:
                     seqnames              ranges strand |            gene_id
                        <Rle>           <IRanges>  <Rle> |        <character>
  ENSG00000000003.15     chrX 100627108-100639991      - | ENSG00000000003.15
   ENSG00000000005.6     chrX 100584936-100599885      + |  ENSG00000000005.6
  ENSG00000000419.12    chr20   50934867-50958555      - | ENSG00000000419.12
  ENSG00000000457.14     chr1 169849631-169894267      - | ENSG00000000457.14
  ENSG00000000460.17     chr1 169662007-169854080      + | ENSG00000000460.17
                 ...      ...                 ...    ... .                ...
   ENSG00000288584.1     chr6 164148022-164152175      + |  ENSG00000288584.1
   ENSG00000288585.1     chr3 141449745-141456434      - |  ENSG00000288585.1
   ENSG00000288586.1     chr9   35603437-35605139      - |  ENSG00000288586.1
   ENSG00000288587.1     chr6   31400702-31463705      + |  ENSG00000288587.1
   ENSG00000288588.1     chr4     6245563-6261639      + |  ENSG00000288588.1
  -------
  seqinfo: 25 sequences (1 circular) from an unspecified genome; no seqlengths
```


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://ycl6.gitbook.io/guide-to-rna-seq-analysis/differential-expression-analysis/building-a-txdb-object.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
