A DB-type anvi’o artifact. This artifact is typically generated, used, and/or exported by anvi’o (and not provided by the user)..
🔙 To the main page of anvi’o programs and artifacts.
anvi-analyze-synteny anvi-compute-functional-enrichment-in-pan anvi-compute-gene-cluster-homogeneity anvi-compute-genome-similarity anvi-db-info anvi-delete-misc-data anvi-delete-state anvi-display-pan anvi-export-items-order anvi-export-misc-data anvi-export-state anvi-get-sequences-for-gene-clusters anvi-import-collection anvi-import-items-order anvi-import-misc-data anvi-import-state anvi-merge-bins anvi-meta-pan-genome anvi-migrate anvi-show-collections-and-bins anvi-show-misc-data anvi-split anvi-summarize anvi-update-db-description anvi-script-add-default-collection
A pan-db is an anvi’o database that contains key information associated with your gene clusters. This is vital for its pangenomic analysis, hence the name. If you want to learn more about the pangenomic workflow in Anvi’o, it has its own tutorial here.
This is the output of the program anvi-pan-genome, which can be run after you’ve created a genomes-storage-db with the genomes you want to analyze. That script does the brunt of the pangenomic analysis; it caluclates the similarity between all of the genes in your genomes-storage-db, clusters them and organizes the final clusters. All of the results of that analysis are stored in a pan-db.
You can use a pan database to run a variety of pangenomic analyses, including anvi-compute-genome-similarity, anvi-analyze-synteny, and anvi-compute-functional-enrichment-in-pan. You can also view and interact with the data in a pan-db using anvi-display-pan.
To add additional information to the pangenome display, you’ll probably want to use anvi-import-misc-data
While it is possible to read and write a given anvi’o pan database through SQLite functions directly, one can also use anvi’o libraries to initiate a pan database to read from.
import argparse
from anvio.dbops import PanSuperclass
args = argparse.Namespace(pan_db="PAN.db", genomes_storage="GENOMES.db")
pan_db = PanSuperclass(args)
Once an instance from PanSuperclass
is initiated, the following member function will give access to gene clusters:
pan_db.init_gene_clusters()
print(pan_db.gene_clusters)
{
"GC_00000001": {
"Genome_A": [19, 21],
"Genome_B": [30, 32],
"Genome_C": [122, 125],
"Genome_D": [44, 42]
},
"GC_00000002": {
"Genome_A": [123],
"Genome_B": [176],
"Genome_C": [175],
"Genome_D": []
},
(...)
"GC_00000036": {
"Genome_A": [],
"Genome_B": [24],
"Genome_C": [],
"Genome_D": []
}
(...)
Each item in this dictionary is a gene cluster describes anvi’o gene caller ids of each gene from each genome that contributes to this cluster.
gene_clusters_of_interest = set(["GC_00000006", "GC_00000036"])
gene_cluster_sequences = pan_db.get_sequences_for_gene_clusters(gene_cluster_names= gene_clusters_of_interest)
print(gene_cluster_sequences)
{
"GC_00000006": {
"Genome_A": {
23: "MDVKKGWSGNNLND--NNNGSFTLFNAYLPQAKLANEAMHQKIMEMSAKAPNATMSITGHSLGTMISIQAVANLPQAD"
},
"Genome_B": {
34: "MDVKKGWSGNNLND--NNNGSFTLFNAYLPQAKLANEAMHQKIMEMSAKAPNATMSITGHSLGTMISIQAVANLPQAD"
},
"Genome_C": {
23: "MDVKKGWSGNNLNDWVNNNGSFTLFNAYLPQAKLANEAMHQKIMEMSAKAPNATMSITGHSLGTMISIQAVANLPQAD"
},
"Genome_D": {
23: "MDVKKGWSGNNLNDWVNNAGSFTLFNAYLPQAKLANEAMHQKIMEMSAKAPNATMSITGHSLGTMISIQAVANLPQAD"
}
},
"GC_00000036": {
"Genome_A": {},
"Genome_B": {
24: "MSKRHKFKQFMKKKNLNPMNNRKKVGIILFATSIGLFFLFAFRTTYIVATGKVAGVSLKEKTA"
},
"Genome_C": {},
"Genome_D": {}
}
}
Edit this file to update this information.