Florian Weimer wrote:
* Howard Chu:We require that fsync() (actually fdatasync()) doesn't lie. Data pages can be written in any order, as long as all outstanding data pages are actually written by the time fsync returns. Given this constraint, you can pull the power on a drive and the DB will still be fine.And you do an fsync() as part of every transaction commit? Doesn't this force quite a bit of non-sequential writes? (Page 0 or 1, plus all the pages written to during the transaction.)
Yes, but in fact we send writes to the OS in sorted order (ascending page number). Net result is that performance on an HDD is better than random seek time because there are no head direction reversals in the middle of a commit. (Assuming the underlying I/O scheduler doesn't get in the way. Usually we use the noop scheduler.) Also we allocate and reuse pages in linear order, so very frequently our writes are purely sequential. A trace of disk activity on writes would be reminiscent of an old style typewriter - gradual progress from 0/1 upward followed by a carriage return.
It's curious that this results in competitive write performance. (WAL-based systems only need one sequential write during transaction commit.) Perhaps spreading out the writes at commit is worth all the seeking and the potentially redudant writes because it avoids write-induced stalls at checkpoints (or compaction events or whatever databases do to look good in benchmarks but fall over in practice :-).
LMDB doesn't need dirty tricks to look good. (And at only 6KLOCs of source, there's nowhere to hide any tricks anyway.)
This is as apples-to-apples real world as it gets, since the BDB backends in OpenLDAP have been around for over a decade and tuned to the Nth degree.
http://wiki.zimbra.com/wiki/OpenLDAP_MDB_vs_HDB_performanceWe've spent years fighting with bad behaviors in DBs. We have none of them in LMDB.
http://www.anchor.com.au/blog/2013/05/second-strike-with-lightning/ -- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/