Michael Vogt wrote:
On Sat, Oct 22, 2005 at 06:00:04PM +0200, Petr Vandrovec wrote:I have no idea what I'm doing wrong. So I wrote something what looks more reasonable than original Geller's patch to me. Am I really only one who caresthat apt is corrupting package it downloads ? This time tetex-base_3.0.orig.tar.gz is corrupted on redownload as its last byte is CR (0D) (it is replaced with 'H', see previous messages in this bugreport and bug 290694).I applied your patch to the apt I uploaded to experimental. I'm not entirely sure about possible side-effect in the patch so I would liketo see it tested in experimental first.
Thanks. Original code seems to think that line delimiter is either LF or LF-CR. But RFC says quite clearly that it is CR-LF. So usually you have:
Content-type: application/octet-stream<CR><LF> <CR><LF> binarydatawhich current parser parses as 'Content-type: application/octet-stream<CR><LF><CR>' and then finds empty line with <LF> only.
But if binary data start with <CR>, old parser find empty line with <LF><CR>, eating first byte of data payload. Then whole payload is shifted by one byte, and at the end first byte of following 'HTTP/1.0 ...' response is eaten, causing failure for subsequently downloaded package as parser does not understand 'TTP/1.0' header.
With my fixes it parses example above as 'Content-type: application/octet-stream<CR><LF>' followed by empty '<CR><LF>' line, and it should never look beyond this last <LF> at transfered data.
Maybe this loop should skip all <CR> bytes while doing copy, so subsequent code can rely on lines separated with single <LF> only, but as it seems that everybody already handles random <CR>s scattered through headers it would be just cleanup...
Can sombody at least tell me why this important data corrupting bug is ignored for more than year?Probably because this is a very central piece of the code and any mistake here is fatal. Anyway, it's in experimental now and let's hope we find enough people to test it :)
Thanks. Petr Vandrovec