Re: Community renewal and project obsolescence

To: "M. Zhou" <lumin@debian.org>
Cc: Debian Project <debian-project@lists.debian.org>, Sébastien Villemot <sebastien@debian.org>
Subject: Re: Community renewal and project obsolescence
From: Rafael Laboissière <rafael@debian.org>
Date: Thu, 28 Dec 2023 16:34:04 +0100
Message-id: <[🔎] ZY2VbCijAV2Abm3Y@laboissiere.net>
In-reply-to: <[🔎] 7b3101612ad3ae5747e6bf8f80b59a02a231b00d.camel@debian.org>
References: <[🔎] ZYyFIWNt_gEJoPuR@laboissiere.net> <[🔎] 7b3101612ad3ae5747e6bf8f80b59a02a231b00d.camel@debian.org>

* M. Zhou <lumin@debian.org> [2023-12-27 19:00]:

Thanks for sharing the figure. The data seems correlated with thenumber of new Debian accounts. See the figure below:Python Code for this figure:


 ```
 # modified from ChatGPT.
 # XXX: members.csv is copy-pasted from https://nm.debian.org/members/
 import pandas as pd
 import matplotlib.pyplot as plt
 df = pd.read_csv('members.csv', sep='\t')
 df = df[df['Since'] != '(unknown)'] # filter out invalid data
 df['Since'] = pd.to_datetime(df['Since'])
 df['Year'] = df['Since'].dt.year
 account_counts = df['Year'].value_counts().sort_index()
 smoothed_counts = account_counts.rolling(window=3).mean()
 plt.figure(figsize=(10, 6))
  plt.bar(account_counts.index, account_counts.values, color='skyblue')
 plt.plot(smoothed_counts.index, smoothed_counts.values, color='orange',
 label=f'Smoothed (Window=3)')
 plt.xlabel('Year')
 plt.ylabel('Number of Accounts Created')
 plt.title('Number of Accounts Created Each Year')
 plt.legend()
 plt.savefig('nm-year.png')
 ```

Thanks for the code and the figure. Indeed, the trend is confirmed byfitting a linear model count ~ year to the new members list. Thecoefficient is -1.39 member/year, which is significantly different fromzero (F[1,22] = 11.8, p < 0.01). Even when we take out the data from year2001, that could be interpreted as an outlier, the trend is stillsiginificant, with a drop of 0.98 member/year (F[1,21] = 8.48, p < 0.01).


Best,

Rafael Laboissière

P.S.1: The correct way to do the analysis above is by using ageneralized linear model, with the count data from a Poisson distribution(or, perhaps, by considering overdispersed data). I will eventually addthis to my code in Git.

P.S.2: In your Python code, it is possible to get the data frame directlyfrom the web page, without copying&pasting. Just replace the line:


    df = pd.read_csv('members.csv', sep='\t')

by:

    df = pd.read_html("https://nm.debian.org/members/";)[0]

I am wondering whether ChatGPT could have figured this out…

Reply to:

Follow-Ups:
- Re: Community renewal and project obsolescence
  - From: Mo Zhou <lumin@debian.org>

References:
- Community renewal and project obsolescence
  - From: Rafael Laboissière <rafael@debian.org>
- Re: Community renewal and project obsolescence
  - From: "M. Zhou" <lumin@debian.org>

Prev by Date: Re: Shutdown of servers at AQL (mips*el porterbox and buildds)
Next by Date: Re: Community renewal and project obsolescence
Previous by thread: Re: Community renewal and project obsolescence
Next by thread: Re: Community renewal and project obsolescence
Index(es):
- Date
- Thread