Bug#680971: python3-apt: apt_inst.TarFile.extractdata could accepts bytes
On Tue, Sep 10, 2013 at 10:31:02PM +0200, Julian Andres Klode wrote:
> On Tue, Sep 10, 2013 at 05:24:14PM +0200, Michael Vogt wrote:
> > On Mon, Sep 09, 2013 at 02:14:56PM +0200, Jakub Wilk wrote:
> > > * Julian Andres Klode <jak@debian.org>, 2013-08-26, 13:48:
> > > >>The apt_inst.TarFile.extractdata method requires that it's
> > > >>argument is a str. It would be nice if it also accepted bytes
> > > >>objects. This would be consistent with e.g. built-in open
> > > >>function, which accepts one or the other as filename.
> > > >
> > > >Thank you for your bug report; and yes, you're right. We are using
> > > >PyArg_ParseTuple() to parse the string. We currently use the
> > > >format string "s" for parsing file names. If you happen to know
> > > >the best format string we could use that accepts bytes strings as
> > > >well, please let us know. It has to work with Python 2.7 as well.
> > >
> > > The documentation for the "s" format reads: "This format does not
> > > accept bytes-like objects. If you want to accept filesystem paths
> > > and convert them to C character strings, it is preferable to use the
> > > O& format with PyUnicode_FSConverter() as converter."
> > >
> > > PyUnicode_FSConverter is not available in Python 2.X, though.
> >
> > The attached patch should implement this for Python 3.1+, python2.X is
> > unchanged (for now). Feedback/review welcome!
>
> The problem is that there are various other places where we deal with
> filenames. It does not make sense to fix this one function, but not
> the other ones.
>
> And it seems to be a memory leak, because py_member needs to be
> freed.
>
> What we could probably do is to write a custom converter that
> takes care of the memory management and also works on Python 2. The
> encoder would
> (a) if needed: convert unicode to bytes using PyUnicode_EncodeFSDefault
> (b) create something like the following from the bytes object, where the
> a reference to the bytes object is stored in the object member (the
> code here is not complete, it's only a quick draft):
>
> class PyApt_Filename {
> public:
> PyObject *object;
> char *path;
>
> PyApt_Filename(PyObject *object, char *path) {
> this->object = object;
> this->path = path;
> }
>
> ~PyApt_Filename() {
> Py_DECREF(object);
> }
>
> static int Converter(PyObject *object, void *out) {
> PyApt_Filename *real_out = (PyApt_Filename *) out;
>
> if (PyUnicode_Check(object)) {
> object = PyUnicode_EncodeFSDefault(object);
> } else {
> Py_INCREF(object);
> }
>
> if (!PyBytes_Check(object)) {
> ... <set exception> ...
> return 0;
> }
>
> real_out->object = object;
> real_out->path = PyBytes_AS_STRING(object);
>
> return 1;
> }
> }
>
> (we can also use C++ magic stuff and drop the path member, and
> overload an operator for use as char* if we want)
>
> By storing this on the stack we have automatic memory management
>
> PyApt_Filename path;
> if (PyArg_ParseTuple(args,"O&",PyApt_Filename::Converter, &path) == 0)
> .... < use path.path here to access file path > ...
> .... cleanup happens automatically ...
>
> I'll try to implement this tomorrow.
>
It took a bit longer, but I implemented this in the (temporary and
to be rebased) wip/bytes-path branch of the apt/python-apt.git
repository. I pushed it there so everyone can have a look.
It's missing test cases currently, and is not fully tested, I'll
add those later (definitely before pushing to master).
--
Julian Andres Klode - Debian Developer, Ubuntu Member
See http://wiki.debian.org/JulianAndresKlode and http://jak-linux.org/.
Reply to: