e2dis status update
>>>>> Ivan Shmakov <firstname.lastname@example.org> writes:
> The news is that both the disassembly (e2dis) and reassembly (imrt)
> tools are now working (but read below for a caution) and available
> from their public Git repository  at Gitorious!
>  https://gitorious.org/e2dis/e2dis-devel
> Unfortunately, the performance of the image reassembly tool (imrt) is
> extremly poor for the filesystems of more than a few MiB's size.
Long story short: the changes I've made over a month made imrt
significantly faster. I didn't do much testing, but it seems
like an order of magnitude jump!
> (And it seems that there may be subtle bugs, too.)
The bug I was referring to is that it seems that the version of
libgcrypt I use apparently doesn't support as many as 30 or 40
digest objects existing at the same time. With the digest
removal logic re-done properly (8f56056d), it doesn't seem like
a big issue anymore.
> As with jigdo-file(1), imrt doesn't rely on filenames, and instead
> “guesses” the output chunks the files passed to it correspond by
> comparing the hashes (SHA-1 and SHA256 as of a726267a.) However,
> such a comparison is currently implemented in a straightforward yet
> suboptimal (as in: totally dumb) way, leading to the problem.
It was improved considerably in the commits from 2acc4706 to
Then, I've switched to using prepared statements extensively
(a51bf977, 5d2f278c), thus reducing the time to complete a
simple 64 MiB test image reassembly by roughly 20%.
Finally, I've implemented the “cue sheet” support (abd326b5,
64743751.) Now, e2dis recurses over the filesystem's
directories and records the filenames for all the “chunks” whose
digests are recorded. Conversely, imrt uses this table to
narrow the comparison of the files being processed to only such
digests. If that fails, it still falls back to doing full
search. For the test image, this change reduces the time by
some 75% more!
As for the missing parts: there's still virtually no command
line interface, and I hope to fix that within a month or so,
making a proper release shortly after.
Neither is there documentation, nor handy tools to maintain the
databases created. In particular, while the format allows a
single databased to hold indices for several images (say, it may
be images for different platforms), there's no tools to either
“split” such a database, or “join” a few together.
FSF associate member #7257