MetagenomeDB 0.2.2

Metagenome sequences and annotations database
MIT/X Consortium License 
Aurelien Mazurie
ROOT \ Science and Engineering \ Bioinformatics
MetagenomeDB is a Python library designed to easily store, retrieve and annotate metagenomic sequences. MetagenomeDB act as an abstraction layer on top of a MongoDB database. It provides an API to create and modify and connect two types of objects, namely sequences and collections:

 * sequences (Sequence class) can be reads, contigs, PCR clones, etc.
 * collections (Collection class) represents sets of sequences; e.g., reads resulting from the sequencing of a sample, contigs assembled from a set of reads, PCR library

Any object can be annotated using a dictionary-like syntax:

# first, we import the library
import MetagenomeDB as mdb

# then we create a new Sequence object with two
# (mandatory) properties, 'name' and 'sequence'
s = mdb.Sequence({"name": "My sequence", "sequence": "atgc"})

# the object can now be annotated
print s["length"]
s["type"] = "read"

# once modified, the object need to be committed
# to the database for the modifications to remain

Objects of type Sequence or Collection can be connected to each other in order to represent various metagenomic datasets. Examples include, but are not limited to:

 * collection of reads resulting from a sequencing run (relationship between multiple Sequence objects and one Collection)
 * set of contigs resulting from the assembly of a set of reads (relationship between two Collection objects)
 * reads that are part of a contig (relationship between multiple Sequence objects and one Sequence)
 * sequence that is similar to another sequence (relationship between two Sequence objects)
 * collection that is part of a bigger collection (relationship between two Collection objects)

The result is a network of sequences and collection, which can be explored using dedicated methods; i.e.g., Collection.list_sequences(), Sequence.list_collections(), Sequence.list_related_sequences(). Each one of those methods allow for sophisticated filters using the MongoDB querying syntax:

# list all collections of type 'collection_of_reads'
# the sequence 's' belong to
collections = s.list_collections({"type": "collection_of_reads"})

# list all sequences that also belong to these collections
# with a length of at least 50 bp
for c in collections:
 print c.list_sequences({"length": {"$gt": 50}})

MetagenomeDB also provides a set of command-line tools to import nucleotide sequences, protein sequences, BLAST and FASTA alignment algorithms output, and ACE assembly files. Other tools are provided to add or remove multiple objects, or to annotate them.

Last updated on January 30th, 2011


#metagenome sequences #annotations database #metagenome #sequences #annotations #database

  Add it to your Download Basket!

 Add it to your Watch List!


Rate it!
send us
an update

Add your review!