[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

e2dis status update



>>>>> Ivan Shmakov <ivan@gray.siamics.net> writes:

[…]

 > The news is that both the disassembly (e2dis) and reassembly (imrt)
 > tools are now working (but read below for a caution) and available
 > from their public Git repository [1] at Gitorious!

 > [1] https://gitorious.org/e2dis/e2dis-devel

[…]

 > Unfortunately, the performance of the image reassembly tool (imrt) is
 > extremly poor for the filesystems of more than a few MiB's size.

	Long story short: the changes I've made over a month made imrt
	significantly faster.  I didn't do much testing, but it seems
	like an order of magnitude jump!

 > (And it seems that there may be subtle bugs, too.)

	The bug I was referring to is that it seems that the version of
	libgcrypt I use apparently doesn't support as many as 30 or 40
	digest objects existing at the same time.  With the digest
	removal logic re-done properly (8f56056d), it doesn't seem like
	a big issue anymore.

 > As with jigdo-file(1), imrt doesn't rely on filenames, and instead
 > “guesses” the output chunks the files passed to it correspond by
 > comparing the hashes (SHA-1 and SHA256 as of a726267a.)  However,
 > such a comparison is currently implemented in a straightforward yet
 > suboptimal (as in: totally dumb) way, leading to the problem.

	It was improved considerably in the commits from 2acc4706 to
	b5009c14 (roughly.)

	Then, I've switched to using prepared statements extensively
	(a51bf977, 5d2f278c), thus reducing the time to complete a
	simple 64 MiB test image reassembly by roughly 20%.

	Finally, I've implemented the “cue sheet” support (abd326b5,
	64743751.)  Now, e2dis recurses over the filesystem's
	directories and records the filenames for all the “chunks” whose
	digests are recorded.  Conversely, imrt uses this table to
	narrow the comparison of the files being processed to only such
	digests.  If that fails, it still falls back to doing full
	search.  For the test image, this change reduces the time by
	some 75% more!

	As for the missing parts: there's still virtually no command
	line interface, and I hope to fix that within a month or so,
	making a proper release shortly after.

	Neither is there documentation, nor handy tools to maintain the
	databases created.  In particular, while the format allows a
	single databased to hold indices for several images (say, it may
	be images for different platforms), there's no tools to either
	“split” such a database, or “join” a few together.

[…]

-- 
FSF associate member #7257


Reply to: