[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Some progress about the packaging of nextflow

Am 02.06.2021 um 00:46 schrieb Noah Meyerhans:
On Tue, Jun 01, 2021 at 01:03:11PM +0200, Steffen Möller wrote:
The biggest tasks would be to package the SDK of AWS [0] (which would also
benefit igv) and of Microsoft Azure [1]. We would also need to package
Apache Ignite [2].
After the freeze I shall discuss this on the ML of the Debian Java team. I
cannot really figure out how hard packaging those whole SDK would be. I also
have a poor idea of the maintenance burden over time.

[0] https://github.com/aws/aws-sdk-java
[1] https://github.com/Azure/azure-sdk-for-java
[2] https://github.com/apache/ignite/
These sound a bit scary for a single person (just guessing from the
names that it might be huge) but in the end it will probably be used by
lots of other software which in turn might mean that there are more
people who might join the effort to package these.
Yip. Both for packaging and for maintenance. I CCed our cloud team which
may have some extra opinion/direction for us.
I don't think a lot of the cloud team members have much experience
packaging Java stuff.  However, we are involved in packaging tools and
SDKs published by various cloud vendors in other languages.  In my
experience maintaining and working with maintenance of the AWS Python
and Go SDKs, here are the issues you'll likely need to contend with:

1. Bundled dependencies
2. Fast upstream release cadence

#1 applies to a lot of code published by a number of cloud services.
Rather than cope with different versions of their dependencies shipped
with the various operating systems they target, the upstream maintainers
of many of these projects find it easier to embed copies of their build
dependencies.  Per Debian policies, packages cannot use these policies
and must instead rely on packaged versions of these dependencies.  So in
that case, you may need to do some amount of work to avoid using these
bundled dependencies, possibly including packaging the dependencies
themselves for Debian.  Fortunately, as far as I can tell, neither the
AWS SDK for Java nor the Azure SDK for Java seem to bundle their

For #2, cloud services release new features at a very high rate, and
consequently publish new SDK releases often.  If you look at the AWS SDK
for Java, for example, you'll see multiple tagged releases *per week*:

Fortunately, the vast majority of these releases contain only
machine-generated code changes based on API definitions (e.g.
Swagger/OpenAPI), and backward compatibility is maintained.
Additionally, the upstreams are generally good about sticking to
Semantic Versioning, so any breaking change will be reflected in the
version numbers.  With this in mind, once you have packaging
configuration in place, you should be able to (nearly) fully automate
the packaging of new releases.

As I said, I don't think we in the cloud team have any meaningful
experience with Java, so I don't think we can be much help with the
actual packaging work.  I'm sure others on the team can correct me if
I'm wrong about that.  But I think we can help with guidance, reviews,
etc, if that's useful.

Thank you, Noah.

Let us start with a discussion where the sdk itself should be
team-maintained - cloud or med or java?

Personally, I'd opt for Java, since this seems like to be the core
challenge for us. If this drags in other packages that we don't have in
Debian, yet, well, then then this should help improve Debian on the way
since apparently something is missing that is used by professionals. All
that would also be in the Java realms of interest.

We need Nextflow earlier than we can reach to the clouds with it.
Covid-19 related workflows (like https://workflowhub.eu/workflows/19) is
what triggered all this and we better keep our focus. So, how soon we
can start with the AWS SDK shall depend on our emerging team building
for it.


Reply to: