[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: [debian-knoppix] cloop file format change proposal



Hello Bernhard,

I've been pondering about your proposal.

At Mon, Jan 13, 2003 at 10:28:34AM +0100, bernhard@152146.vserver.de wrote:
> - a version number is included in the header

This is a Good Thing - we all agree I think. Klaus mentioned the #!/bin/sh
start of the header (GoodThing too); we could start like
#!/bin/sh
#Cloop Version: x.y.z
insmod .....

> - the index is at the end of the file, followed by
> - a number that indicates the exact position of the start of the index.
> while Klaus objected:
> think that stuff like version information or indexes really belong to
> the beginning of a file, not at the end, so you still can access them if
> later parts of the file are corrupted.
> 
> seek times are another issue.
> 
> while file curruption should be better detected by MD5-sums and the
> associated data is not available then anyway there are some alternatives
> (2 and 3) below.
> 
> 
> looking at the final data layout following the header this would look like
> block 1 data, block 2 data, ..., block n data, block 1 size, ..., 
> block n size, offset of block 1 size (64 bit)

Please note that the current format doesn't store *sizes*, it stores
*offsets*. If you store sizes, the current format could easily handle
extremely large cloop files, as the size is never larger than some 655525
(or something alike; a compressed block that has no compression has this
size). The disadvantage is that reading the index is not enough; the cloop.o
module would need to calculate the absolute offsets from the sizes. The
cloop.o would need a major rewrite, but the indexes could be handled by a
16+some bit size entry.

A 64-bit cloop would probably use indexes again - thus limiting the maximum
size but speeding up the retrieval of the index.

[...]
> Alternative 2 would be
> block 1 size, block 1 data, block 2 size, block 2 data, ..., block n data, 
> 0
> +easy to code
> +low mem
> +integrity with end missing or corrupted

I think you mean "-integrity" here?

> -bad seek times

This would be a bad design. You would need to read the CD like a sort of
tape before you could do random access. Say no ;-)

> Alternative 3 would be
> value of m, block 1 size, block 2 size, ..., block m size, block 1 data, 
> block 2 data, ..., block m data, block m+1 size, block m+2 size, ...,
> block n size, 0
> with m<n e.g. m=511 to m=4096 seem reasonable

Hmm. I'm not sure.

What about:

header - size0 data0 - size1 data1 .... sizeN dataN - 
     indexheader index0 index1 ... indexN - indexsize c.q. number of blocks.

The problem we seem to have with the index at the end is that it's not safe
- although I've searched online documentation like the CDR-faq and I
couldn't find any information about this. Now the problem is probably that a
disk with a broken index is totally unusable. However, if we code it like
above, a broken index would mean that a disk *would* be usable - albeit very
slowly. You would best copy it with the then-existing copy_compressed_fs
utility that would simpy read all blocks and recreate the index.

Your m/n idea could be useful in the sense that an unreadable index looses
lesser blocks, i.e. if you have m index blocks, you would loose 1/m of all
your blocks.

> btw: you could have mentioned somewhere
> http://packages.debian.org/unstable/misc/cloop-src.html
> or some other means of obtaining the latest version (e.g. CVSRoot)

The problem was that the latest version had 2 instances: 1.4-valentijn and
1.4-klaus. The latter went into cloop-src I think (I didn't look). My latest
version is at http://projects.openoffice.nl/downloads/compressloop/ and I'd
suggest this one to go in cloop.

V.
-- 
http://www.openoffice.nl/   Open Office - Linux Office Solutions
Valentijn Sessink  valentyn+sessink@nospam.openoffice.nl
_______________________________________________
debian-knoppix mailing list
debian-knoppix@linuxtag.org
http://mailman.linuxtag.org/mailman/listinfo/debian-knoppix


Reply to: