[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Bug#1054657: transition: r-bioc-biocgenerics



On 7 November 2023 at 22:01, Charles Plessy wrote:
| One possible direction would be to leverage the work done by Dirk and
| others in r2u, where the Bioc transition is over, and for each package
| in Debian, look if the r2u equivalent has a dependency not in Debian.
| 
| https://fediscience.org/@eddelbuettel@mastodon.social/111359074099802189

Thanks for the endorsement, Charles.

As you brought r2u up, allow me to add my perspective. I have done so before
without changing anyone's mind but once every few years I get to howl at
these windmills.

So I have been maintaining CRAN packages in Debian for 20 years [1], and I
said for twenty years that we can trust CRAN. I meant that then, I mean it
now.  Ditto for BioConductor.

Doubling all our testing up, and also throwing spanners into our own wheels
via the autopkgtests, is (to me) a waste of our (limited !!) volunteer time.
We *do* add value to CRAN (and BioConductor) because we build on much more
exotic platforms than they do.  But testing _again_ on core platforms like
x86_64 is (to me) simply does not seem all that efficient.

My r2u [2] is a case in point. As of last Friday, I had ~ 270 BioConductor
packages in it (that is for Ubuntu LTS release 20.04 and 22.04, and of course
in addition to the 22k CRAN packages each already has).  I then rebuilt those
270 first for 'focal' (20.04) and then 'jammy' (22.04) on my machine [3] and
uploaded them.

After that, I realized I could and should check against BioConductor's own
'popularity context' [4,5] and ensured I had the top 200+ packages. And I
also ran a `setdiff()` against the package 'testing' knew. So I added from
both these source on the weekend. So r2u is now at 391 or so BioConductor
packages, all at 3.18, for both 20.04 and 22.04. And 22.2k for CRAN.

This does provide the obvious existence proof that yes, right after a
BioConductor release their stuff of course works: they have AFAIK paid staff
to ensure this.

r2u has been running for a little over 1 1/2 years. It has shipped over 10
million packages (and I luckily have access to a well-connected mirror on the
U of Illinois campus as I teach there part-time). It had a download spike in
October (from a European research center, I have access to download logs)
fetching 3+ million in two days (!!). It now sees a daily (!!) download from
a 'well known US west coast tech giant' taking in about 5200 packages _each
day_ from what looks like a cron job. It serves about 1000 unique IPs each
day. There is clear demand for this.

So if we wanted to do something useful, we should extend r2u to Debian. I
have limited 'personal' bandwidth and hardware but if someone wanted to join
we could make some hay here.  People trust apt.  The technology is there and
works as we all know.

It might be worth discussing how we can offer the 19.9k packages on CRAN [6]
and all/most of BioConductor. We may want to do that in a to-be-determined
form outside the distro as the ftpmasters (whose work I so appreciate, so let
me say a big thanks here) cannot possibly 'manually' check 20+k thousand
packages.

But as I said on the outset: We *can* trust CRAN and BioConductor and take
advantage and leverage their work which (among many other things) contains
the same authorship, copyright, IP, ... tests we do.

Thanks for listening for my sermon. I will now be quiet again and concentrate
on these (in aggregate coming up on) 45k packages. I do appreciate everything
that everybody does here -- we are after all a bunch of committed volunteers.

Cheers, Dirk

[1] The very first one we had was IIRC my r-cran-rodbc as ODBC headers always
    baffled users; and still do
[2] See https://eddelbuettel.github.io/r2u
[3] For BioConductor I cannot (?) use pre-made binaries as I do for (most of)
    CRAN via R-style binaries from p3m.dev which I turn into proper .deb files.
[4] They call it somethings else, and 'score' downloads by unique IP over a
    rolling (12 months if I recall) window
[5] See https://bioconductor.org/packages/stats/bioc/bioc_pkg_scores.tab
[6] CRAN purges reasonably aggressively which is how r2u is now at 22.2k
    while CRAN is at 19.9k.

-- 
dirk.eddelbuettel.com | @eddelbuettel | edd@debian.org


Reply to: