[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: process-deb-once-with-python review



On 2014-07-06 20:49, Jakub Wilk wrote:
> On IRC, Neils asked to me review the Python part of this:
> http://anonscm.debian.org/gitweb/?p=users/nthykier/lintian.git;a=shortlog;h=refs/heads/process-deb-once-with-python
> 

Hi Jakub,

Thanks for the review.

For those not on IRC, I have made a prototype python script for reducing
the load during unpacking.  The script is able to implement "unpacked",
"index" and "bin-pkg-control" for binary packages.  This idea was
spawned by testing libreoffice + plus its debs, which is horribly slow
(40minutes wall time, with 20m user time and 5-6m sys time)

IRT performance, I am fairly excited about the reduced load.  Particular
packages compressed with data.tar.xz seems to win a bit with this, since
we no longer decompress the same tar file twice (in separate processes).
 There is also a win in not having to write "data.tar" to the disk
(coll/index - for generating the two index files).

> 
> TL;DR: I don't think extract_tar() is secure.
> 

This is of course a requirement for merging this prototype.

FTR, there are some other "minor" issues with the prototype.  Currently
two of the "debs" tests fail.
 * One due to how they handle tarfiles with absolute paths.
   - tar(1) warns about it
   - this prototype silently fixes it => missing tags
   => this one can be fixed by emitting the same warning.
 * One due to "implausibly date"
   - tar(1) emits a warning
   - this prototype fails with "invalid header"
   => This one will probably require a "fall-back" to using tar(1)

>>            target = os.path.normpath(os.path.join(output_dir,
>> member.name))
>>            if (not target.startswith(output_dir_stem)
>>                and target != output_dir):
>>                raise ValueError("%s escapes unpack-root" % member.name)
> 
> os.path.normpath() doesn't access filesystem, so it won't protect you
> from escaping unpack-root via a symlink.
> 

You mean something like:

symlink usr to /etc
extract usr/password (which overwrites /etc/password)

That should be fairly trivial to fix by doing a follow up check with
realpath.  I originally used realpath, but was concerned about how it
reacted to missing files (as the target path usually does not exist).

For good measure, I guess we should assert that the target does not
exist (not even/especially not as a symlink) - if only as a fail-safe.

>>            elif member.issym():
>>                mtype = 'l'
>>                linkinfo = ' -> ' + member.linkname
>>            elif member.islnk():
>>                mtype = 'h'
>>                linkinfo = ' link to ' + member.linkname
> 
> What if linkname contains a newline, or another special character?
> Shouldn't it be escaped somehow?
>
>>            line = '%s0%o %d:%d:%s:%s %d %s _ %s%s\n' % (
>>                mtype, member.mode, uid, gid,
>>                owner, group, member.size,
>>                mdate.strftime('%Y-%m-%d'), name, linkinfo
>>            )
> 
> Same problem with escaping here.
> 

That is certainly a valid point.

>>            member.mode |= 0200
> 
> We should also make sure that the permissions won't be excessive
> (honoured umask, no setuid, no setgid etc.) and that the file will be
> owned by the user who runs the code.
> 

Certainly, that could be done.  Do you know if tar(1) does that w.
permissions as well (for normal users)?
  So far I have not had an issue with owners, despite extracting
data.tar, where all files are (usually) owned by root/root.  That said,
I guess we should be safe rather than sorry here as well.

>>            tf.extract(member, path=output_dir)
> 
> The extract() method is unfortunately quite dumb. :-( It will happily
> write over existing symlink, or write to an existing FIFO, or...
> 

If we ensure the extraction does not escape the unpack root and assert
that the target does not exist in any way, I guess this problem would be
avoided?


Reply to: