[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: apache arrow anyone?



Hi Steffen and Etienne,

[...]
>> Apache Arrow https://arrow.apache.org/faq/ knows how to efficiently
>> handle large tabular data. And, while not in our distribution, it blocks
>> some workflows for Debian Med. Arrow comes with interfaces to all the
>> prominent languages, for the Med-workflows it is typically the Python
>> interface pyarrow that is needed.

I am also facing a project [1] that now made a dependency on Arrow (the
C++ interface, for me) mandatory and that a missing Arrow in Debian
prevented me from updating the packaging to the latest upstream version,
leaving it stuck at some version from May.

>> I am not using Arrow myself, but I presume just like me you all know
>> some project that should be using it :)

Yep :)

> Thank you for the prospective!  I see Sasha filed an RFP some
> time ago [1], so there is definitely interest in Apache Arrow.
> I don't know whether there is a packaging effort at the moment,
> but if there is, I haven't found it by doing a research on
> Salsa.

I have prepared a first show of packaging for Arrow that works well for
my case [2]. It replicates almost all binary packages built by
upstream's own packaging pipeline (for version 4, at least, that's when
I stopped looking at it) and I only had to tweak the build parameters a
little bit. FYI they are doing their own Debian debs via JFrog and their
own Ruby-based packaging tooling [3, 4].

The rest of the story is that I considered my package ready to upload,
but a project partner familiar with using Arrow let me know that their
development cycle is quite fast, with several breaking new major
versions recently, support for new languages being added all the time
(Rust, ...) and with adopting other code as well (Parquet, ...).
This made me a bit uneasy as I only needed it as a dependency and I did
not really bite off more than I could chew.

I contacted upstream to ask for support [5] but it looked to me that
they would rather not like to help out with Debian packaging directly.
They would probably consider specific patches form us but in general
stick to their own packaging tools. See the linked thread for more
information.
I must admit I did not really have the time so far to follow up with
explaining how things are done in Debian and that they and us are
probably using too different approaches for packaging.

Long story short: I have finished packaging for Arrow 4 which looks good
(someone might want to double-check the long d/copyright though) but I
am not sure I want to track and maintain it _on my own_ in the long run.
If someone from the Debian Med team wants to collaborate on this, be it
packaging or upstream interaction, I would be willing to reconsider =)

Cheers
Sascha


[1] https://tracker.debian.org/pkg/vast
[2] https://salsa.debian.org/satta/arrow
[3] https://apache.jfrog.io/ui/native/arrow/debian
[4] https://github.com/apache/arrow/tree/master/dev/tasks/linux-packages
[5]
https://lists.apache.org/thread.html/rcd366cf9bde72d69e942ea31f3a0f1066727f6c7e8915bfdda6f009a%40%3Cdev.arrow.apache.org%3E


Reply to: