[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#680971: python3-apt: apt_inst.TarFile.extractdata could accepts bytes



On Tue, Sep 10, 2013 at 10:31:02PM +0200, Julian Andres Klode wrote:
> On Tue, Sep 10, 2013 at 05:24:14PM +0200, Michael Vogt wrote:
> > On Mon, Sep 09, 2013 at 02:14:56PM +0200, Jakub Wilk wrote:
> > > * Julian Andres Klode <jak@debian.org>, 2013-08-26, 13:48:
> > > >>The apt_inst.TarFile.extractdata method requires that it's
> > > >>argument is a str. It would be nice if it also accepted bytes
> > > >>objects. This would be consistent with e.g. built-in open
> > > >>function, which accepts one or the other as filename.
> > > >
> > > >Thank you for your bug report; and yes, you're right. We are using
> > > >PyArg_ParseTuple() to parse the string. We currently use the
> > > >format string "s" for parsing file names. If you happen to know
> > > >the best format string we could use that accepts bytes strings as
> > > >well, please let us know. It has to work with Python 2.7 as well.
> > > 
> > > The documentation for the "s" format reads: "This format does not
> > > accept bytes-like objects. If you want to accept filesystem paths
> > > and convert them to C character strings, it is preferable to use the
> > > O& format with PyUnicode_FSConverter() as converter."
> > > 
> > > PyUnicode_FSConverter is not available in Python 2.X, though.
> > 
> > The attached patch should implement this for Python 3.1+, python2.X is
> > unchanged (for now). Feedback/review welcome!
> 
> The problem is that there are various other places where we deal with
> filenames. It does not make sense to fix this one function, but not
> the other ones.
> 
> And it seems to be a memory leak, because py_member needs to be
> freed.
> 
> What we could probably do is to write a custom converter that
> takes care of the memory management and also works on Python 2. The
> encoder would 
>  (a) if needed: convert unicode to bytes using PyUnicode_EncodeFSDefault
>  (b) create something like the following from the bytes object, where the
>      a reference to the bytes object is stored in the object member (the
>      code here is not complete, it's only a quick draft):
> 
> class PyApt_Filename {
> public:
> 	PyObject *object;
> 	char *path;
> 
> 	PyApt_Filename(PyObject *object, char *path) {
> 		this->object = object;
> 		this->path = path;
> 	}
> 
> 	~PyApt_Filename() {
> 		Py_DECREF(object);
> 	}
> 
> 	static int Converter(PyObject *object, void *out) {
> 		PyApt_Filename *real_out = (PyApt_Filename *) out;
> 		
> 		if (PyUnicode_Check(object)) {
> 			object = PyUnicode_EncodeFSDefault(object);
> 		} else {
> 			Py_INCREF(object);
> 		}
> 
> 		if (!PyBytes_Check(object)) {
> 			... <set exception> ...
> 			return 0;
> 		}
> 
> 		real_out->object = object;
> 		real_out->path = PyBytes_AS_STRING(object);
> 		
> 		return 1;
> 	}
> }
> 
> (we can also use C++ magic stuff and drop the path member, and
>  overload an operator for use as char* if we want)
> 
> By storing this on the stack we have automatic memory management
> 
>      PyApt_Filename path;
>      if (PyArg_ParseTuple(args,"O&",PyApt_Filename::Converter, &path) == 0)
>      .... < use path.path here to access file path > ...
>      .... cleanup happens automatically ...
> 
> I'll try to implement this tomorrow.
> 

It took a bit longer, but I implemented this in the (temporary and
to be rebased) wip/bytes-path branch of the apt/python-apt.git 
repository. I pushed it there so everyone can have a look.

It's missing test cases currently, and is not fully tested, I'll
add those later (definitely before pushing to master).

-- 
Julian Andres Klode  - Debian Developer, Ubuntu Member

See http://wiki.debian.org/JulianAndresKlode and http://jak-linux.org/.


Reply to: