[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Source package contains non-free IETF RFC/I-D's



Some raised a concern with false positives in my reports -- and also
tagged all the bugs with etch-ignore.  I went through all bug reports
manually yesterday (see earlier mail), but I also realized that it
would be possible to do this automatically, to provide further
assurance that the bugs indicate real and confirmed problems.

I've updated my script to do this, view it last on the page:
http://wiki.debian.org/NonFreeIETFDocuments

The script will run md5sum on the RFC/I-D in source packages, and
compare them against a known-real repository (rsync'ed against
ftp.rfc-editor.org).

The output of the script is very long, so I won't include it here.  An
URL to it is:
http://josefsson.org/bcp78broken/debian-ietf-documents-diff.txt

To parse the output yourself, look for lines beginning with 'pkg'.
Those denote the start of a new package with potential problems.
After that there will be lines such as 'tar xfz...' and two MD5 sums.
If the MD5 sums match, it will print MATCH.  If the MD5 sums mismatch,
it will print MISMATCH.  If it can't find a known-good file to compare
with, it prints FETCH-FAIL.

Some statistics:
  74 packages
 401 MATCH, i.e., the RFC in the source package is an authentic RFC
  79 MISMATCH, i.e., the RFC differ from the authentic RFC
   6 FETCH-FAIL

Note that this does _not_ mean that there were 79 false positives in
my reports.  Nothing I did today indicates that there are any more
false positives except (possibly) draft-zebra-00.txt that I found
manually yesterday.

The FETCH-FAIL's are few and easy to analyze:

FETCH-FAIL draft-davis-dasl-protocol-00.txt
FETCH-FAIL spf-draft-20040209.txt
FETCH-FAIL spf-draft-200405.txt
FETCH-FAIL rfc.txt
FETCH-FAIL rfc.txt
FETCH-FAIL draft-zebra-00.txt

I can't find the first document anywhere on the Internet, possibly the
filename is incorrect, although it looks like a submitted IETF
document.  spf-* were submitted through the IETF under other names.
rfc.txt is a dummy file.  draft-zebra-00.txt was the likely false
positive I found manually yesterday.

The MISMATCH'es are more interesting to analyze, and indicate a
variety of reasons.

As can be seen in the file, just a few pages down, one reason is that
the RFC in the source package differs from the authenticate RFC!
E.g., typos has been corrected.  Modifying the document is not
permitted by the IETF license, so these files do not seem to be
legally distributable at all, not even in non-free.

Several files differ trivially, such as removed/added initial/terminal
newlines, or changing multiple newlines into one newline.

At least one file differ due to RCS $Id$ tags.

In the DateTime-Format-Mail archive, the files differ substantially
because the source package only contains a small excerpt from the RFC,
instead of the entire RFC.

Some files differ because I can't compare them to the real document,
because the IETF used to put a "RIP-notice" that the document has
expired using the same filename.  The diff output for all of them
suggests that these are real IETF documents, though.

/Simon



Reply to: