[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Updating python3-xlrd for pandas 1.5 compatibility



On Fri, 2023-02-24 at 19:33 +0100, Paul Gevers wrote:
> Hi Diane,
> 
> On 23-02-2023 08:12, Diane Trout wrote:
> > the version of python3-xlrd 1.2.0-3 in unstable/testing is too old
> > to
> > be used with pandas 1.5.3. (See Bug #1031701).
> 
> Do I understand correctly that this isn't an issue from the point of 
> python3-xlrd and that only pandas is effected? While investigating
> for 
> this reply I noticed src:pandas doesn't even have a dependency in any
> of 
> its binaries.

It looks like the xlrd dependency was commented out because the Debian
version is too old, though apparently that was done 7 months ago.

https://salsa.debian.org/science-team/pandas/-/blob/main/debian/control#L45

Here's the pandas module that conditionally uses xlrd if it's
available.

https://salsa.debian.org/science-team/pandas/-/blob/main/pandas/io/excel/_xlrd.py

> 
> > As it is a really common
> > workflow to use pandas to read excel files, it'd be nice if the
> > version
> > of xlrd in bookworm was compatible.
> 
> As the maintainer of pandas, do you consider it an RC issue that
> pandas 
> can't convert it? I guess not because you say "it'd be nice" and you 
> don't even have the required dependency. How severe do you consider
> this 
> issue for pandas? pandas has a quite extensive autopkgtest, doesn't
> it 
> cover this use case? Apparently you knew this earlier, why do you
> bring 
> this up now?

The issue is somewhere between a minor and a normal bug, it breaks a
small component of the library.

I wouldn't claim to be a maintainer of pandas, I feel Rebecca Palmer
has been doing the vast amount of work keeping pandas updated in
Debian.

I started investigating this up after my coworker ran into while trying
to process an .xls file. And when I looked, saw someone else had also
recently filed the same bug report.

> 
> > Because of the freeze I wanted to check if it was appropriate to
> > upload
> > the new version,
> 
> I'd hope that the "rules" are clear: 
> https://release.debian.org/testing/freeze_policy.html#soft. You can 
> contact the Release Team if you need further clarification.
> 
> > and what kind of warning I should give to the other
> > developers.
> 
> It depends. I'm worried about what you write below.

That's fair.

The counter argument is that xlrd's support for handling the xml based
.xslx files was unsafe since Python 3.9, and it has been recommended to
switching to another package like openpyxl to handle xlsx files for a
while.

(Release from xlrd announcement for thread mentioning the removal, and
then goes into discussing the security issues)
https://groups.google.com/g/python-excel/c/IRa8IWq_4zk/m/Af8-hrRnAgAJ

The reason the issue doesn't show up much is .xls files are deprecated
by nearly everyone, this only shows up when you're reading old data or
generated by old software.

The reason this is likely a minor issue, is there's a simple work
around which is to convert your xls file to a xlsx file.

Here's Pandas's discussion about deprecating xlrd for xlsx files.
https://github.com/pandas-dev/pandas/issues/28547

 
> > Here's the list of packages I found that have any relationship to
> > python-xlrd, if it looked like the autopkgtests actually tested
> > using
> > the xlrd library and what the level of declared dependency is.
> > (none
> > means the package lacks autopackage tests)
> > 
> > > nemo                 | none     | Recommends    |
> > > odoo-14              | none     | Depends       |
> > > ofxstatement-plugins | none     | Depends       |
> > > psychopy             | unlikely | Depends       |
> > > python3-agateexcel   | yes      | Depends       |
> > > python3-canmatrix    | no       | Recommends    |
> > > python3-drslib       | no       | Recommends    |
> > > python3-glue         | yes      | Depends       |
> > > python3-pyspectral   | probably | Suggests      |
> > > python3-rows         | unlikely | Recommends    |
> > > python3-tablib       | unlikely | Depends       |
> > > visidata             | none     | Build-Depends |
> > > vistrails            | none     | Build-Depends |
> > > python-xrt           | none     | Build-Depends |
> > > pyutilib             | none     | Build-Depends |
> 
> If I read everything correctly, it seems like you're too late with
> this 
> change.


With a bit more wakefulness, I looked through the packages that have
any dependency on xlrd.

I think odoo-14 is the package most likely to have issues. They use
xlrd and seem to expect to be able to read and write xls & xlsx files
using xlrd. Needless to say, updating xlrd would then break the ability
to process xlsx files. Though of course the xlrd upstream thinks that's
unreliable, and I have no idea how important this feature is to them.

(the odoo repository also has tests, and someone could in theory write
autopkgtests for it)

I couldn't figure out what pyspectral is doing.

These packages ofxstatement-plugins, psychopy, python3-agateexcel,
python3-rows, python3-tablib, and visidata appear to also depend
on/recommend openpyxl so they likely use the xlrd for .xls files and
openpyxl for .xsx files as xlrd has been recommending.

python3-canmatrix uses a different package python3-xlsxwriter to deal
with xlsx files
https://salsa.debian.org/python-team/packages/python-canmatrix/-/blob/debian/main/setup.py#L104

Nemo looks to only be using xlrd for older .xls files, and has a
different tool for the newer files. They seem to be using mimetypes and
use this block for .xlsx files.
https://salsa.debian.org/search?search=vnd.openxmlformats-officedocument.spreadsheetml.sheet&nav_source=navbar&project_id=17703&group_id=2992&search_code=true&repository_ref=master

and this block for .xls files
https://salsa.debian.org/cinnamon-team/nemo/-/blob/master/search-helpers/mso-xls.nemo_search_helper

python3-drslib appears to be expecting to be used on .xls files.
(looking through)
https://sources.debian.org/src/drslib/0.3.1.p3-2/drslib/p_cmip5/init.py/

vistrails only lists xlrd as a build depends, and it's tests seems to
think it might work with both xls and xlrx files, but the test code in
the package seems to only test xls files.

And as an aside, I found that python-xrt probably should remove
python3-xlrd from it's build dependencies as the package doesn't seem
to use it.
https://codesearch.debian.net/search?q=package%3Apython-xrt+xlrd

Ultimately the argument that this is a relatively minor feature, cuts
both ways. It both suggests the risk of updating is relatively low, but
also there's less reason to update.

Thank you for your time evaluating this request.
Diane

Attachment: signature.asc
Description: This is a digitally signed message part


Reply to: