[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Bug#1064291: marked as done (ITP: python-awkward -- manipulate JSON-like data with NumPy-like idioms)



Hi Sascha,

Am Sun, Feb 25, 2024 at 08:32:52PM +0100 schrieb Sascha Steinbiss:
> > nox > Creating virtual environment (virtualenv) using python3 in .nox/prepare
> > nox > python -m pip install build numpy packaging PyYAML requests tomli
> > nox > Command python -m pip install build numpy packaging PyYAML requests tomli failed with exit code 1:
> > WARNING: The directory '/nonexistent/.cache/pip' or its parent directory is not owned or is not writable by the current user. The cache has been disabled. Check the permissions and owner of that directory. If executing pip with sudo, you should use sudo's -H flag.
> > ...
 
> Hmm, I never had that.

Sounds good.  Maybe something is broken on my side?

> The buildd builds also seem to pass this step with no
> errors.

Even better.

> I am wondering if nox wants to pull anything from PyPI using pip and
> either your DNS configuration is weird or your build container cannot access
> the Internet (which than again it shouldn't)... but if buildds are not
> supposed to access the Internet why don't the builds fail there? On the
> buildds the issue there seems to be related to a script using an old(?)
> Python YAML library syntax:
> https://buildd.debian.org/status/package.php?p=python-awkward

I admit I'm fine with the situation that the build works everywhere
except for me as long as I have no reason to build it. ;-)
 
> > Even worse:  It requires pyarrow
> 
> It does? The project metadata, where all other dependencies are listed, does
> not mention it, and the dependencies that are listed there are all from
> Debian:
> 
> ...
> dependencies = [
>     "awkward_cpp==29",
>     "importlib_metadata>=4.13.0;python_version < \"3.12\"",
>     "numpy>=1.18.0",
>     "packaging",
>     "typing_extensions>=4.1.0; python_version < \"3.11\"",
>     "fsspec>=2022.11.0"
> ]
> ...
> 
> Grepping through the test requirements though seems to indicate that the
> tests in the awkward package need pyarrow:

Its not only the tests:

$ grep -Rl pyarrow src/* | wc -l
76

> ❯ grep pyarrow *requi* pyproject.toml
> requirements-test-full.txt:pyarrow>=7.0.0;sys_platform != "win32" and
> python_version < "3.12"
> requirements-test-minimal.txt:pyarrow==7.0.0
> 
> But I disabled the Python unit tests (mostly because I couldn't get them to
> work with the combined C++ and Python package built here) so I didn't catch
> it.
> I was also under the impression that awkward working with data from pyarrow
> would just be one use case, but not a hard requirement for awkward to work.
> Probably too optimistic given your observations :/

Seems so.  Maybe we finally need to tackle pyarrow.
 
> > ____________________ ERROR collecting tests/test_awkward.py ____________________
> > ImportError while importing test module '/tmp/autopkgtest.sD3d5G/autopkgtest_tmp/tests/test_awkward.py'.
> > Hint: make sure your test modules/packages have valid Python names.
> > Traceback:
> > /usr/lib/python3.12/importlib/__init__.py:90: in import_module
> >      return _bootstrap._gcd_import(name[level:], package, level)
> > tests/test_awkward.py:205: in <module>
> >      ak.str.to_categorical(ak.Array([["a", "b", "c"], ["a", "b"]])),
> > /usr/lib/python3/dist-packages/awkward/_dispatch.py:62: in dispatch
> >      next(gen_or_result)
> > /usr/lib/python3/dist-packages/awkward/operations/str/akstr_to_categorical.py:53: in to_categorical
> >      return _impl(array, highlevel, behavior, attrs)
> > /usr/lib/python3/dist-packages/awkward/operations/str/akstr_to_categorical.py:60: in _impl
> >      pc = import_pyarrow_compute("ak.str.to_categorical")
> > /usr/lib/python3/dist-packages/awkward/_connect/pyarrow.py:60: in import_pyarrow_compute
> >      raise ImportError(error_message.format(name))
> > E   ImportError: to use ak.str.to_categorical, you must install pyarrow:
> 
> In that case it looks like anndata needs a feature of awkward that also
> needs pyarrow. Which is, yeah, a problem, given that we don't have anything
> form Arrow in Debian (yet). Since pyarrow is just bindings that need the
> rest of the library (https://arrow.apache.org/docs/python/index.html) then
> this pulls in another large dependency.

That's correct.
 
> > Since python-anndata is the last package with Python3.12 issues there is some work
> > left on this front.
> 
> True. I am afraid that Arrow would be too much for me to take care of at the
> moment, sorry. I started it a while ago but quickly found that that I bit
> off more than I can chew without it becoming a full-time task.
> (See my ITP and the thread on it)

I'm aware of this.  Thank you for your work on this anyway.

Kind regards
    Andreas.


-- 
http://fam-tille.de


Reply to: