[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#680971: python3-apt: apt_inst.TarFile.extractdata could accepts bytes



On Tue, Sep 10, 2013 at 05:24:14PM +0200, Michael Vogt wrote:
> On Mon, Sep 09, 2013 at 02:14:56PM +0200, Jakub Wilk wrote:
> > * Julian Andres Klode <jak@debian.org>, 2013-08-26, 13:48:
> > >>The apt_inst.TarFile.extractdata method requires that it's
> > >>argument is a str. It would be nice if it also accepted bytes
> > >>objects. This would be consistent with e.g. built-in open
> > >>function, which accepts one or the other as filename.
> > >
> > >Thank you for your bug report; and yes, you're right. We are using
> > >PyArg_ParseTuple() to parse the string. We currently use the
> > >format string "s" for parsing file names. If you happen to know
> > >the best format string we could use that accepts bytes strings as
> > >well, please let us know. It has to work with Python 2.7 as well.
> > 
> > The documentation for the "s" format reads: "This format does not
> > accept bytes-like objects. If you want to accept filesystem paths
> > and convert them to C character strings, it is preferable to use the
> > O& format with PyUnicode_FSConverter() as converter."
> > 
> > PyUnicode_FSConverter is not available in Python 2.X, though.
> 
> The attached patch should implement this for Python 3.1+, python2.X is
> unchanged (for now). Feedback/review welcome!

The problem is that there are various other places where we deal with
filenames. It does not make sense to fix this one function, but not
the other ones.

And it seems to be a memory leak, because py_member needs to be
freed.

What we could probably do is to write a custom converter that
takes care of the memory management and also works on Python 2. The
encoder would 
 (a) if needed: convert unicode to bytes using PyUnicode_EncodeFSDefault
 (b) create something like the following from the bytes object, where the
     a reference to the bytes object is stored in the object member (the
     code here is not complete, it's only a quick draft):

class PyApt_Filename {
public:
	PyObject *object;
	char *path;

	PyApt_Filename(PyObject *object, char *path) {
		this->object = object;
		this->path = path;
	}

	~PyApt_Filename() {
		Py_DECREF(object);
	}

	static int Converter(PyObject *object, void *out) {
		PyApt_Filename *real_out = (PyApt_Filename *) out;
		
		if (PyUnicode_Check(object)) {
			object = PyUnicode_EncodeFSDefault(object);
		} else {
			Py_INCREF(object);
		}

		if (!PyBytes_Check(object)) {
			... <set exception> ...
			return 0;
		}

		real_out->object = object;
		real_out->path = PyBytes_AS_STRING(object);
		
		return 1;
	}
}

(we can also use C++ magic stuff and drop the path member, and
 overload an operator for use as char* if we want)

By storing this on the stack we have automatic memory management

     PyApt_Filename path;
     if (PyArg_ParseTuple(args,"O&",PyApt_Filename::Converter, &path) == 0)
     .... < use path.path here to access file path > ...
     .... cleanup happens automatically ...

I'll try to implement this tomorrow.

-- 
Julian Andres Klode  - Debian Developer, Ubuntu Member

See http://wiki.debian.org/JulianAndresKlode and http://jak-linux.org/.


Reply to: