Re: Community renewal and project obsolescence

To: debian-project@lists.debian.org
Subject: Re: Community renewal and project obsolescence
From: Mo Zhou <lumin@debian.org>
Date: Thu, 28 Dec 2023 14:02:18 -0500
Message-id: <[🔎] 3b37894d-eeb6-4861-814c-c11d633b1be3@debian.org>
In-reply-to: <[🔎] ZY2VbCijAV2Abm3Y@laboissiere.net>
References: <[🔎] ZYyFIWNt_gEJoPuR@laboissiere.net> <[🔎] 7b3101612ad3ae5747e6bf8f80b59a02a231b00d.camel@debian.org> <[🔎] ZY2VbCijAV2Abm3Y@laboissiere.net>

On 12/28/23 10:34, Rafael Laboissière wrote:

* M. Zhou <lumin@debian.org> [2023-12-27 19:00]:
Thanks for the code and the figure. Indeed, the trend is confirmed byfitting a linear model count ~ year to the new members list. Thecoefficient is -1.39 member/year, which is significantly differentfrom zero (F[1,22] = 11.8, p < 0.01). Even when we take out the datafrom year 2001, that could be interpreted as an outlier, the trend isstill siginificant, with a drop of 0.98 member/year (F[1,21] = 8.48, p< 0.01).

I thought about to use some models for population statistics, so we canget the data about DD birth rate and DD retire/leave rate, as well as aprediction. But since the descendants of DDs are not naturally new DDs,the typical population models are not likely going to work well. Thebirth of DD is more likely mutation, sort of.

Anyway, we do not need sophisticated math models to draw the conclusionthat Debian is an aging community. And yet, we don't seem to have a goodway to reshape the curve using Debian's funds. -- this is one of the keyproblems behind the data.

P.S.1: The correct way to do the analysis above is by using ageneralized linear model, with the count data from a Poissondistribution (or, perhaps, by considering overdispersed data). I willeventually add this to my code in Git.


Why not integrate them into nm.debian.org when they are ready?

P.S.2: In your Python code, it is possible to get the data framedirectly from the web page, without copying&pasting. Just replace theline:
    df = pd.read_csv('members.csv', sep='\t')

by:

    df = pd.read_html("https://nm.debian.org/members/";)[0]

I am wondering whether ChatGPT could have figured this out…

I just specified the CSV input format based on what I have copied. Itproduces well-formatted code with detailed documentation in most of thetime. I deleted too much from its outputs to keep the snippet short.

I have to justify one thing to avoid giving you a wrong impression aboutlarge language models. In fact, the performance of an LLM (such asChatGPT) greatly varies based on the prompt and the context peopleprovided to it. Exploring this in-context learning capability is stillone of the cutting edge research topics. For the status-quo LLMs, theiranswers on boilerplate code like plotting (matplotlib) and simplestatistics (pandas) are terribly perfect.

Reply to:

Follow-Ups:
- Aw: Re: Community renewal and project obsolescence
  - From: Steffen Möller <steffen_moeller@gmx.de>

References:
- Community renewal and project obsolescence
  - From: Rafael Laboissière <rafael@debian.org>
- Re: Community renewal and project obsolescence
  - From: "M. Zhou" <lumin@debian.org>
- Re: Community renewal and project obsolescence
  - From: Rafael Laboissière <rafael@debian.org>

Prev by Date: Re: Community renewal and project obsolescence
Next by Date: Aw: Re: Community renewal and project obsolescence
Previous by thread: Re: Community renewal and project obsolescence
Next by thread: Aw: Re: Community renewal and project obsolescence
Index(es):
- Date
- Thread