ODB is an object database frontend for Python.
ODB is a frontend to three different data storage engines. One of them is BDB (Berkeley DB), the other two are memory resident databases like BDB only written entirely in Python (and included with this package - BDB and bsddb3 are separate installs).
The interface is very BDB-ish, only a lot cleaner. ODB tries to hide a lot of the grisly guts of the database interactions from the user.
The "Full" and "Mem" Engines.
ODB began its life as a simple wrapper around Berkeley DB. However, BDB caused us a lot of pain because of its locking issues. We would occassionally run into exceptions thrown as a result of detected deadlocks which were very hard to deal with in our environment - the "right thing" was to attempt the transaction again, complicating our control logic and often leading to further deadlocks. More troublesome were the cases where we leaked locks. It was very difficult for us to find out where the locks were being leaked from and we ended up running database recovery quite a bit - not what you want for a 24/7 production environment.
So at some point I decided to try to give ODB its own in-memory database engine written completely in python. Obviously, performance was not much of an issue for us, so we could get away with a single lock for the entire environment - hence no deadlock issues.
The first incarnation was called "memdb" - it stored the database as a sequence of logfiles. When an instance of the database was started, it would load all of the logfiles into memory and rebuild the database one change at a time.
This worked well enough while the engine was running, but as you might imagine, the initial load overhead for a database of any significant size was completely insane. So I quickly added the ability to checkpoint - you could store the environment as one giant state file and load that at startup. Then only read the log from the point where you checkpointed.
This approach was better, but still had the disadvantage of requiring you to read the entire database at startup. It's not so bad that the database live completely in memory - memory is big these days. The problem is having to perform a complete load at startup.
So this is how the "full" implementation was born. The full implementation
checkpoints the database to a heap file. This allows the internal btrees to
lazy-load nodes as they are accessed, resulting in minimal startup time.
At this time, all of the interfaces are mostly intercompatible - if you code to the abstract odb interface (see "odb.odbi") it should be possible to switch back-ends without any coding differences. If you want to make use of the special features of the "full" back-end, use that interface specifically:
from odb.full import FullEnv
env = FullEnv('database_dir')
- A comparison bug was fixed in the heap file free node list.
- Support for automatic recovery from log files was added.