[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: KDE 4.4.3 upgrade eats 141 MB of /home



On Wed, May 12, 2010 at 10:41:35AM +0200, Kevin Krammer wrote:

> Since you are writing a bit down that you think it is caused by kres-migrator, 
> where did you get it from (here it seems to be part of the kdepim-runtime 
> package).

Yes, kres-migrator is part of kdepim-runtime.  I do have that package
installed, as it seems to be indirectly depended upon by kde-minimal.  It's
kdepim itself, and its application dependencies (kaddresbook, kalarm, kmail
knode, knotes, kontact, korganizer, etc.) I don't have installed.  Maybe
the kdepim-runtime dependency itself is a bug--can't say.

> kres-migrator is called when an application accesses the KResource framework, 
> e.g. some app accessing the old addressbook API.
> Not using KDEPIM apps does not necessarily mean non of your other applications 
> access PIM data.

Looks like the culprit here is libkabc.  There's a "Default Addressboook"
created by the library, that's presumably empty.  I'm not sure what's
loading libkabc in the first place.  I do know that I didn't even have kabc
database files (.kde/share/apps/kabc/std.vcf*) until upgrading to 4.4.
Maybe it's an explicit part of the migration?  Or I suppose one of the
panel widgets I'm using might depend on it now, but I don't believe that's
the case.

> Anyone know what kind of data is stored in these logs?

Looked into this a bit.  The InnoDB documentation itself is a little
lacking on describing its particular architecture, but there's an InnoDB
tuning tutorial [1] that's rather helpful.

These files serve as InnoDB's REDO logs.  They serve two purposes.  First,
committed transactions are written to the REDO logs sequentially, so that
table updates (with possible random seeks) can be done in a write-back mode
"at leisure."

Second, REDO logs serve as a durability mesaure.  Each time the database is
restarted, the REDO logs are replayed to ensure that recent transactions
have been properly commited--say if either the database is "kill -9ed" or
there's other table corruption.  They may also be used in recovery, whereby
if table corruption is found and old tables can be reloaded from backup,
then the REDO logs can be replayed to bring the tables up to date.  You can
also forward REDO logs to standby (fail-over) servers to ensure their
database tables are up to date.

The REDO logs themselves contain row updates from insert/update statements.
So for a given row length, the REDO logs contain the last
LOG_SIZE/ROW_LENGTH transactions.  They're not used in selects or other
non-mutating accesses.

REDO log size is not an issue of correctness.  A small log size might
result in decreased performance by forcing a burst of inserts/updates to be
committed to table before completing a transaction.  A larger log size may
also be of benefit in data recovery if database corruption is found, and a
recent enough table backup is maintained so that the REDO log still
contains all non-backed up transactions.

Let's try to quantify this a bit.  I'm not exactly sure what kind of
database workloads Akonadi is targetting, but for PIM applications we're
looking at managing (1) contacts, (2) calendar entries, (3) "TODO" tasks,
(4) notes-to-self, etc.  It seems to me that each of these things results
in:

- Table row length on order of 1 kB.
- Total number of rows < 10,000 (how many people do you know?)
- Largely read-only data sets, grows over period of years.
- A working set (actively updated rows) < 1,000 per day.  Probably < 100.

Thus, I would conclude that tables rarely grow larger than 10 MB (1 kB
* 10,000).  The number of inserts/updates per table shouldn't exceed
100 kB-1 MB per day.  We're also unlikely to see bursty updates anytime
information is manually provided, since it has to be typed in.  Bursty
updates would happen on device synchronization, at which point you might
see 100 kB-1MB of table updates in a few-second window.

This means that we should be able to record 1-10 days of update history
with 1 MB transaction logs.  And that's under a heavy PIM load.

We also know that InnoDB's "leisure write" pace is 64 pages per second.
Each page is 16 kB.  If we're pessimistic, and say that row updates are
randomly distributed enough such that there's only a single updated-row per
page, then it would take up to 1.5s-15.6s to "leisurly flush" 100 kB-1MB of
table updates.  So it really only makes sense to increase the REDO log
beyond 1 MB for performance purposes if we expect 1,000 of random-row (1
MB) updates to occur more frequently than once every 15 seconds.  That
doesn't really seem plausible with these kinds of workloads.  In the even
it _does_ happen, then it just takes a little longer to finish the sync.

So that's my argument for 1 MB REDO logs.  Let's look at the other defaults
for a bit:

innodb_log_buffer_size=1M -- Should be large enough to assemble a
transaction in memory.  1 MB fits 1000 1kB-single-row transactions.  Fine.

innodb_buffer_pool_size=80M -- Page cache size, including cached read
pages.  Seems fairly large and wasteful use of RAM if we expect tables
themselves to grow no larger than 10MB.  8 MB might be a more reasonable
default, and in the event that a database does grow large, the Linux buffer
cache should eliminiate most of the disk fetches (unless the tables are
opened O_DIRECT, I'm not sure about that).

innodb_file_per_table=1 -- Each table (basically, application) uses its own
DB file.  Implicitly configured is each table having a minimum size of 10
MB, and growing in 8 MB increments.

Cutting to the chase: by the current configuration, each per-user MySQL
instance will use 80 MB of RAM for a database buffer pool, and will create
10 MB files per-table (per-application), but we expect tables to never
increase beyond that size.  Overall seems kind of wasteful for this type of
workload.  These are also defaults that are meant for the lower-end of
centralized DB workload-scenarios, not per-user PIM storage.  If InnoDB is
to be used long term, we could really benefit from constructing a sample
workload and (asking a DBA to help us in) tuning the inital parameters
appropriately.

But if we're going to go with things as is, I've already made the case for
why a 1 MB REDO log should be sufficient.  I would actually claim now,
though, that 5 MB REDO logs wouldn't be unreasonable in this context
either.  Turns out that 5 MB REDO logs are the InnoDB default, and that
would mean that REDO logs would occupy the same amount of disk space as an
empty table.  Since we're paying a 10 MB penalty for each application to
use Akonadi in the first place, another 10 MB for the logs isn't extremely
egregious.

The part that bothers me is that the Akonadi folks are basically aware of
the situation, and feel justified in claiming [2] that 100+ MB of disk is
reasonable.  Franlky, if you ask even an arm-chair DBA if using InnoDB with
these parameters are appropriate for per-user PIM management, they'll look
at you like your crazy--which is, from what I can tell, the underlying
reason for so much of the dislike with KDE 4.4. 

I can't imagine that SQLite was really _so bad_ of a target for low-usage
PIM workloads that the Akonadi folks couldn't have just written a plugin
for it some time ago and filed some bugs.  Afterall, Firefox uses it rather
extensively, seems like it would've been a perfect fit.  But that's another
story, and we just have to make do with what we have right now.

[1] http://mysqldump.azundris.com/archives/78-Configuring-InnoDB-An-InnoDB-tutorial.html
[2] http://techbase.kde.org/Projects/PIM/Akonadi#Akonadi_needs_too_much_space_in_my_home_directory.21


Reply to: