[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: [PROPOSAL] definition of deb binary files



Andreas Barth <aba@not.so.argh.org> writes:

> Hi,
> 
> I made a proposal of an updated deb format definition. I based that on
> the manpage deb (part of dpkg-dev), and on reverse engineering of
> dpkg-deb/build.c. I hope I've written the standard in a right and easy
> to understandable way. I did (by purpose) not add anything about
> signatures etc, but I just wanted to document what we have at current.
> Discussion about additions should (IMHO) be kept seperate.
> 
> IMHO this definition should become part of the policy; I propose
> either an new chapter 12, or an addition to chapter 3 Binary packages,
> whatever seems more appropriate. This means that also some parts of
> Appendix B could be removed at this occasion.
> 
> I'm also Ccing one bug of apt-utils, where I also got some of the
> information from, and debian-devel. Please restrict the crossposting
> on answers if usefull.
> 
> 
> Cheers,
> Andi
> 
> 
> DESCRIPTION
> 
> The .deb format is the Debian binary package file format. It is understood
> by dpkg 0.93.76 and later, and is generated by default by all versions
> of dpkg since 1.2.0 and all i386/ELF versions since 1.1.1elf.
> 
> The format described here is used since Debian 0.93; details of the old
> format are described in deb-old(5).
> 
> 
> OVERALL FORMAT
> 
> The file is an ar archive in a certain ar version and with a magic number
                            ^^^^^^^^^^^^^^^^^^^^^^^

Thats very vague. Explain what version and what that entails. That
also means explaining the limiting to short filenames and no spaces.
Is there more than bsd and sysv flavour in widespread use (if limited
to short names)?

> of !<arch>. Due to the robustness principle, extracting tools should be
> able to cope with as many of the different ar versions as possible; if they
> don't, its at maximum a wishlist bug. On the other hand, tools providing
> .deb-files MUST only provide strictly standard compatible files. Every
> other behaviour is a serious bug!
> 
> The first member of the archive is name debian-binary and contains a series
> of lines, separated by newlines. Currently only one line is present, the
> format version number. The 2.0 format is current, and this format is
> described in that document. Programs which read .deb-files should be
> prepared for the minor number to be increased and new lines to be present,
> and should ignore these if this is the case. If the major number has a
> value a programm doesn't know, an incompatible change has happend, and
> the program should abort with an error.

That sounds a lot like "make dep" but not exactly. Did you change the
odd bit here and there or rewrite that?

The rest of make deb should be included too, i.e. control.tar.gz and
data.tar.gz, naming rules for new members and the _ rule.


The manpage and this documentation should be worded the same.

> OVERALL AR FORMAT
> 
> The ar-format is (by purpose) one of the most ancient formats. This has the
> reason that it should be possible to unpack .deb-files on as many different
> computers as possible. Furthermore, it makes it also more easy for our code
> to handle it.
> 
> Any ar files can be written as AR-FILE := HEADER [MEMBER]*.
> The header is the string "!<arch>\n" (not null terminated).
> 
> Each member itself consists of the member head, and of the body, and, if
> necessary, a padding '\n'. All information in the members head is printable
> ascii, and each value is padded with spaces on the right side; at least one
> space must be present, so the information must be shorter than the maximum
> number of bytes available. The head is composed of the name (16 bytes), the
> date in seconds since epoch (1970-1-1 0:00:00 UTC) in decimal notion (12
> bytes), the uid and gid of the owner in decimal notion (each 6 bytes;
> usually both 0), the file member mode in octal notion, begining with 1 (8
> bytes; usually 100644), the size of the member body (the size is measure
> without possible padding to the body; 10 bytes) and the two bytes "`\n".
> After the member head, the member body follows unquoted; if the member body
> has uneven lenght, it is padded with a single '\n'; so any members start on
> an even byte boundry.
> 
> So, the initial member looks like:
> debian-binary   1070194109  0     0     100644  4         `
> 2.0

Can you add a hexdump -C of this? I think that might show it more
clearly. A complete ar file (byte 0 up to a few bytes of
control.tar.gz) would be better too to show the header as well.

> Newer ar features (as longer file names, filesnames with spaces, ...) are
> a violation of this standard; however, extracting tools should try to
> support them as good as possible, but if they do not, that's just at
> maximum a wishlist bug.

I think its better to follow the may/should/must rules of policy here:

Extracting tools should understand sysv and bsd ar files.
Extracting tools may support long filenames.
Tools providing debs must write policy conform debs.

The priority of a bug follows from that.

> DEB 2 ARCHIVE MEMBERS
> 
> Archives with the major number 2 must have (after the initial member
> debian-binary) in this exact order the members control.tar.gz and
> data.tar.gz. After this, optional members can follow, but they must have a
> '_' as the first character of their name.
> 
> control.tar.gz is a gzipped tar archive containing the package control
> information, as a series of plain files, of which the file control is
> mandatory and contains the core control information. Please see the Debian
> Packaging Manual, section 2.2 for details of these files. The control
> tarball may optionally contain an entry for `.', the current directory.
> 
> data.tar.gz contains the filesystem archive as a gzipped tar archive.

Merge that with the overall format. Its bad to explain debian-binary
up there and the rest down here.
 
> DEB 1 ARCHIVE MEMBERS
> 
> See the man-page deb-old(5) for a definition.

MfG
        Goswin



Reply to: