On IRC, Neils asked to me review the Python part of this: http://anonscm.debian.org/gitweb/?p=users/nthykier/lintian.git;a=shortlog;h=refs/heads/process-deb-once-with-python TL;DR: I don't think extract_tar() is secure.
target = os.path.normpath(os.path.join(output_dir, member.name)) if (not target.startswith(output_dir_stem) and target != output_dir): raise ValueError("%s escapes unpack-root" % member.name)
os.path.normpath() doesn't access filesystem, so it won't protect you from escaping unpack-root via a symlink.
elif member.issym(): mtype = 'l' linkinfo = ' -> ' + member.linkname elif member.islnk(): mtype = 'h' linkinfo = ' link to ' + member.linkname
What if linkname contains a newline, or another special character? Shouldn't it be escaped somehow?
line = '%s0%o %d:%d:%s:%s %d %s _ %s%s\n' % ( mtype, member.mode, uid, gid, owner, group, member.size, mdate.strftime('%Y-%m-%d'), name, linkinfo )
Same problem with escaping here.
member.mode |= 0200
We should also make sure that the permissions won't be excessive (honoured umask, no setuid, no setgid etc.) and that the file will be owned by the user who runs the code.
tf.extract(member, path=output_dir)
The extract() method is unfortunately quite dumb. :-( It will happily write over existing symlink, or write to an existing FIFO, or...
-- Jakub Wilk