Re: process-deb-once-with-python review
On 2014-07-06 20:49, Jakub Wilk wrote:
> On IRC, Neils asked to me review the Python part of this:
> http://anonscm.debian.org/gitweb/?p=users/nthykier/lintian.git;a=shortlog;h=refs/heads/process-deb-once-with-python
>
Hi Jakub,
Thanks for the review.
For those not on IRC, I have made a prototype python script for reducing
the load during unpacking. The script is able to implement "unpacked",
"index" and "bin-pkg-control" for binary packages. This idea was
spawned by testing libreoffice + plus its debs, which is horribly slow
(40minutes wall time, with 20m user time and 5-6m sys time)
IRT performance, I am fairly excited about the reduced load. Particular
packages compressed with data.tar.xz seems to win a bit with this, since
we no longer decompress the same tar file twice (in separate processes).
There is also a win in not having to write "data.tar" to the disk
(coll/index - for generating the two index files).
>
> TL;DR: I don't think extract_tar() is secure.
>
This is of course a requirement for merging this prototype.
FTR, there are some other "minor" issues with the prototype. Currently
two of the "debs" tests fail.
* One due to how they handle tarfiles with absolute paths.
- tar(1) warns about it
- this prototype silently fixes it => missing tags
=> this one can be fixed by emitting the same warning.
* One due to "implausibly date"
- tar(1) emits a warning
- this prototype fails with "invalid header"
=> This one will probably require a "fall-back" to using tar(1)
>> target = os.path.normpath(os.path.join(output_dir,
>> member.name))
>> if (not target.startswith(output_dir_stem)
>> and target != output_dir):
>> raise ValueError("%s escapes unpack-root" % member.name)
>
> os.path.normpath() doesn't access filesystem, so it won't protect you
> from escaping unpack-root via a symlink.
>
You mean something like:
symlink usr to /etc
extract usr/password (which overwrites /etc/password)
That should be fairly trivial to fix by doing a follow up check with
realpath. I originally used realpath, but was concerned about how it
reacted to missing files (as the target path usually does not exist).
For good measure, I guess we should assert that the target does not
exist (not even/especially not as a symlink) - if only as a fail-safe.
>> elif member.issym():
>> mtype = 'l'
>> linkinfo = ' -> ' + member.linkname
>> elif member.islnk():
>> mtype = 'h'
>> linkinfo = ' link to ' + member.linkname
>
> What if linkname contains a newline, or another special character?
> Shouldn't it be escaped somehow?
>
>> line = '%s0%o %d:%d:%s:%s %d %s _ %s%s\n' % (
>> mtype, member.mode, uid, gid,
>> owner, group, member.size,
>> mdate.strftime('%Y-%m-%d'), name, linkinfo
>> )
>
> Same problem with escaping here.
>
That is certainly a valid point.
>> member.mode |= 0200
>
> We should also make sure that the permissions won't be excessive
> (honoured umask, no setuid, no setgid etc.) and that the file will be
> owned by the user who runs the code.
>
Certainly, that could be done. Do you know if tar(1) does that w.
permissions as well (for normal users)?
So far I have not had an issue with owners, despite extracting
data.tar, where all files are (usually) owned by root/root. That said,
I guess we should be safe rather than sorry here as well.
>> tf.extract(member, path=output_dir)
>
> The extract() method is unfortunately quite dumb. :-( It will happily
> write over existing symlink, or write to an existing FIFO, or...
>
If we ensure the extraction does not escape the unpack root and assert
that the target does not exist in any way, I guess this problem would be
avoided?
Reply to: