Re: Berkeley DB 6.0 license change to AGPLv3
Florian Weimer wrote:
* Howard Chu:
We require that fsync() (actually fdatasync()) doesn't lie. Data pages
can be written in any order, as long as all outstanding data pages are
actually written by the time fsync returns. Given this constraint, you
can pull the power on a drive and the DB will still be fine.
And you do an fsync() as part of every transaction commit? Doesn't
this force quite a bit of non-sequential writes? (Page 0 or 1, plus
all the pages written to during the transaction.)
Yes, but in fact we send writes to the OS in sorted order (ascending page
number). Net result is that performance on an HDD is better than random seek
time because there are no head direction reversals in the middle of a commit.
(Assuming the underlying I/O scheduler doesn't get in the way. Usually we use
the noop scheduler.) Also we allocate and reuse pages in linear order, so very
frequently our writes are purely sequential. A trace of disk activity on
writes would be reminiscent of an old style typewriter - gradual progress from
0/1 upward followed by a carriage return.
It's curious that this results in competitive write performance.
(WAL-based systems only need one sequential write during transaction
commit.) Perhaps spreading out the writes at commit is worth all the
seeking and the potentially redudant writes because it avoids
write-induced stalls at checkpoints (or compaction events or whatever
databases do to look good in benchmarks but fall over in practice :-).
LMDB doesn't need dirty tricks to look good. (And at only 6KLOCs of source,
there's nowhere to hide any tricks anyway.)
This is as apples-to-apples real world as it gets, since the BDB backends in
OpenLDAP have been around for over a decade and tuned to the Nth degree.
We've spent years fighting with bad behaviors in DBs. We have none of them in
-- Howard Chu
CTO, Symas Corp. http://www.symas.com
Director, Highland Sun http://highlandsun.com/hyc/
Chief Architect, OpenLDAP http://www.openldap.org/project/