[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: thoughts on architectures

On Mon, Feb 11, 2002 at 11:07:18AM +0100, David Schmitt wrote:
> On Mon, Feb 11, 2002 at 01:25:51AM -0500, utsl@quic.net wrote:
> > 	4. Drop keeping metadata in package filenames. Make them just a unique
> > 	   string, assigned when the package is installed into the archive.
> > 	   That gets rid of the need to add something to the filename to
> > 	   distinguish i386 from sparc packages. Just use the Packages files,
> > 	   and the control fields from the packages themselves. I'd be in
> > 	   favour of going further: use one Packages file, and determine
> > 	   available packages based on the tags your kernel and libc support.
> Which would lead to the problem, that in the pool/ there would be stuff
> like: 
> pool/main/p/package/:
> 	package_Version-1_abcde.deb
> 	package_Version-1_lsahd.deb
> 	package_Version-1_iorzq.deb
> 	package_Version-1_mbmnb.deb
> 	package_Version-1_poiuz.deb
> 	package_Version-1_mjuhb.deb
> Essentially rendering manual download impossible.

Actually, I was thinking of going a bit further than that, and drop the
package names and versions from the filenames too. Just think of a squid

I don't see manual downloads or parital mirroring as a huge problem. It would
simply mean that there'd need to be tools to locate the files. For manual
downloads, a CGI that returns a URL for the right file could work. Or possibly
just a slightly enhanced apt or similar tool.

> This is a question to Marcus Brinkmann: 
> 	How can one distinguish 
> 	package_1.0-1.deb (i686,glibc,mmx) and
> 	package_1.0-1.deb (sparc,netbsd4)
> 	in the pool (i.e. filesystem)?
> I know, in your doc, you don't explicitly specify an 'encoding' for this
> dependency information, but people (as simpleminded as I am) would think
> about some extra entry in the Depends: field, which wouldn't help much
> with filesystemlayout in the mirrors as Philip Charles mentioned in
> regard to partial mirrors and things as pointet out above.

That's why I'd like to take those parts out of the filename, and put together
tools to handle it. It should be a matter of giving a partial mirror program
a pattern you want to mirror with, like "sparc and netbsd," and it generates
a list of files to mirror. That's fairly painless, and filesystem layout can
be just a matter of keeping directories from getting too many files, and
filesystems from overfilling.

I maintained a program that tracked about 450,000 files, all of which were
had information encoded in the filenames. (Ok, it didn't start that large,
but it got there fast.) It was a nightmare until I moved the metadata into a
database, and renamed every file to a number. I could then do lookups in the
database table, and locate the files much more quickly. (I gained an order
of magnitude speed increase in my user interface.) I also was able to balance
out the directories. I had some directories with 10,000 files, and other with
a few hundred. When I was done, I had 4112 directories, with about 100 files
in each.

I think a similar approach would be helpful for Debian. Database tables can
be indexed, and can handle much more complicated queries than filesystems can.
Having information in the filename really doesn't help much of anything except
manual downloads, and I think that can be managed.

Reply to: