Re: [debian-knoppix] cloop file format change proposal

To: Valentijn Sessink <valentyn+knoppix@openoffice.nl>
Cc: debian-knoppix@linuxtag.org
Subject: Re: [debian-knoppix] cloop file format change proposal
From: Bernhard Wiedemann <bernhard@lsmod.de>
Date: Fri, 17 Jan 2003 13:14:27 +0100 (CET)
Message-id: <[🔎] Pine.LNX.4.44.0301171159330.13011-100000@152146.vserver.de>
In-reply-to: <[🔎] 20030115210840.GA31678@openoffice.nl>

Hello Valentijn

On Wed, 15 Jan 2003, Valentijn Sessink wrote:

> > - a version number is included in the header
> 
> This is a Good Thing - we all agree I think. Klaus mentioned the #!/bin/sh
> start of the header (GoodThing too); we could start like
> #!/bin/sh
> #Cloop Version: x.y.z
> insmod .....
good idea... only alternative I can think of would be
#!/bin/sh
insmod version=x.y.z ...
which would save us from some easy text-parsing... but would throw 
warnings on older cloop modules that do not have that param

> > looking at the final data layout following the header this would look like
> > block 1 data, block 2 data, ..., block n data, block 1 size, ..., 
> > block n size, offset of block 1 size (64 bit)
> 
> Please note that the current format doesn't store *sizes*, it stores
> *offsets*. If you store sizes, the current format could easily handle
> extremely large cloop files, as the size is never larger than some 655525
> (or something alike; a compressed block that has no compression has this
> size). The disadvantage is that reading the index is not enough; the cloop.o
> module would need to calculate the absolute offsets from the sizes. The
> cloop.o would need a major rewrite, but the indexes could be handled by a
> 16+some bit size entry.

I'm well aware of this difference and would prefer storing sizes since the 
computation is easy and fast. (btw: the number you ment is 65536 -> 65562)

> A 64-bit cloop would probably use indexes again - thus limiting the maximum
> size but speeding up the retrieval of the index.
Finally we have all the cloop-code to save storage space in the first 
place. It may even be faster to read 16+n bit for 20000 to 40000 blocks 
and generate 64bit-indices in RAM instead of reading 40000*64bit.
Todays processors are very fast, while CDROMs are slow.


> > Alternative 3 would be
> > value of m, block 1 size, block 2 size, ..., block m size, block 1 data, 
> > block 2 data, ..., block m data, block m+1 size, block m+2 size, ...,
> > block n size, 0
> > with m<n e.g. m=511 to m=4096 seem reasonable
> 
> Hmm. I'm not sure.
> 
> What about:
> 
> header - size0 data0 - size1 data1 .... sizeN dataN - 
>      indexheader index0 index1 ... indexN - indexsize c.q. number of blocks.
That looks like a combination of alternative 1 and 2
would work... MD5sum would help as well to guard against data corruption 
or Eight-To-Fourteen-encoding (used by CD-ROM already) or 
Reed-solomon-coder... but that is theoretical again.

> The problem we seem to have with the index at the end is that it's not safe
> - although I've searched online documentation like the CDR-faq and I
> couldn't find any information about this. Now the problem is probably that a
> disk with a broken index is totally unusable. However, if we code it like
> above, a broken index would mean that a disk *would* be usable - albeit very
> slowly. You would best copy it with the then-existing copy_compressed_fs
> utility that would simpy read all blocks and recreate the index.
might still create a faulty copy you will not want to use
- at yesterdays tests I got read errors at 700MB and 3MB on my CD-RW
so any part of the medium may become bad and we can not ensure intact 
data.

> Your m/n idea could be useful in the sense that an unreadable index looses
> lesser blocks, i.e. if you have m index blocks, you would loose 1/m of all
> your blocks.
using my above m we would have ceil(n/m) index blocks... still if m is too 
low we get bad seek times while some large m(e.g. m=n) would need a lot of 
RAM again... then your alternative (#1) would be better again.

> > btw: you could have mentioned somewhere
> > http://packages.debian.org/unstable/misc/cloop-src.html
> > or some other means of obtaining the latest version (e.g. CVSRoot)
> 
> The problem was that the latest version had 2 instances: 1.4-valentijn and
> 1.4-klaus. The latter went into cloop-src I think (I didn't look). My latest
> version is at http://projects.openoffice.nl/downloads/compressloop/ and I'd
> suggest this one to go in cloop.
for me _some_ pointer to some recent version would have sufficed


unrelated to this topic, but might be useful for optimising knoppix iso 
sort-order too:
http://www.lsmod.de/dl/trace-0.5.8.tar.gz
a tracer for open-syscalls that may be insmoded at the beginning of 
miniroot's linuxrc - but make sure to have syslog/klogd running early and 
properly 'cause there will be many trace: /path/to/file messages
compile using
make all KERNELDIR=/usr/src/linux-2.4.20-xfs

best regards
Bernhard M. Wiedemann
__________
design has to be chosen carefully

_______________________________________________
debian-knoppix mailing list
debian-knoppix@linuxtag.org
http://mailman.linuxtag.org/mailman/listinfo/debian-knoppix

Reply to:

References:
- Re: [debian-knoppix] cloop file format change proposal
  - From: Valentijn Sessink <valentyn+knoppix@nospam.openoffice.nl>

Prev by Date: [debian-knoppix] insmod i810
Next by Date: Re: [debian-knoppix] cloop file format change proposal
Previous by thread: Re: [debian-knoppix] cloop file format change proposal
Next by thread: Re: [debian-knoppix] cloop file format change proposal
Index(es):
- Date
- Thread