[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: dpkg -b allowed to build with a non-utf8 control file



Thanks for the feedback. 

On Sat, 8 Apr 2023, 12:32 Guillem Jover, <guillem@debian.org> wrote:
Hi!

On Sun, 2023-03-19 at 17:09:00 +0100, Juanmi Taboada wrote:
> Checking documentation for deb packages, I read that the control file
> should be UTF-8:
>
>    - Reference:
>    https://www.debian.org/doc/debian-policy/ch-controlfields.html
>    - 5.1 Syntax of control files at the end: *"All control files must be
>    encoded in UTF-8."*
>
> I was able to build a non-utf8 package using *dpkg -b*.
>
> This was originally reported in Landscape-Client:
> https://bugs.launchpad.net/landscape-client/+bug/1813442

> Making reference to the first version, '1.0.0.944' of the package "veeam".
> The report points:
> "The strange character is the U+FFFD � REPLACEMENT CHARACTER."
>
> I was able to reproduce the problem in Landscape Client, and I discovered
> the error came from a wrong encoding used in the control file.
> I made a wrong encoded description, which reproduced the error on our side.
>
> Nevertheless, it is not a bug in Landscape but in dpkg, which allowed
> building a deb package with a wrong encoded control file.

The dpkg deb822(5) man page has similar wording, I think mostly
because it was adapted from the Debian policy. So, while I think
settling on UTF-8 for the only supported encoding makes sense, dpkg
itself does not really care, and will work with pretty much any
encoding thrown at it, for the things it cares it restricts itself
to just ASCII and tries to validate that strictly.

In this case I think there might be four (or more) potential bugs
here:

 1) The deb822(5) man page should probably be clarified to distinguish
    what to expect about encodings.
 2) The dpkg-source (et al), dpkg-deb and dpkg might perhaps need to be
    improved to be more strict when parsing, and validating their
    inputs, including encoding.
 3) The affected packages with wrong encoding should get bugs filed
    and fixed.
 4) The landscape client software should ideally cope more gracefully,
    and not fail when confronted with wrongly encoded files? Because
    these can also be generated by something that is not dpkg-deb, as
    people seem to be fond of creating their own .deb packers for their
    build systems and other tooling.

> The broken description package is attached for further study.

Thanks, I've added an entry to my TODO to handle the above items from
the dpkg side.

Regards,
Guillem

Reply to: