[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Bug#1014908: ITP: gender-guesser -- Guess the gender from first name



Edward Betts <edward@4angle.com> writes:

> I've been writing some code to work out the gender balance of speakers
> at a conference. It parses the pentabarf XML of the schedule and feeds
> the speaker names to this module.

> Here's the results for Debconf 22.

> 72 speakers

> male              48   66.7%
> unknown           16   22.2%
> female             4    5.6%
> mostly_male        2    2.8%
> andy               1    1.4%
> mostly_female      1    1.4%

I fear this may be an example of statistics that look meaningful but
probably aren't because the error bar is much higher than the typical
consumer of the statistic intuitively thinks it is.  Although maybe that's
not a worry in this case since the program itself says that it totally
failed to make a guess about a quarter of the time.

I don't really have any objections to the package being in the archive;
this is certainly something that a lot of people seem to want to do and
thus seem to find some utility in doing.  But unless one has a
higher-quality source of data than just names (preferred pronouns, direct
self-identification, etc.), I personally would be worried about attaching
the appearance of scientific accuracy (three significant figures!) to data
that, depending on the nationalities involved and the strength of naming
conventions and other factors, may be only rough guesswork.

I know someone who keeps similar statistics as an aid to balancing the
range of authors of books he chooses to review, and I see why someone
would want to do that.  But he tries to use higher-quality data sources
than guessing based on names, and that feels like a best practice for that
kind of thing to me.

(Also, due to the limitations and history of naming conventions, the
software is inherently trying to map into a gender binary, which if one is
attempting to capture self-identification is likely to be unhelpful for
many populations, such as ones with lots of people under 30, due to not
having a way to represent nonbinary people.)

Anyway, that's just all my personal opinion and I don't think any of that
says that the package shouldn't be in the archive.  We package all sorts
of not-very-useful software and that's totally fine.  But I've worked in
identity management fields for long enough to have a professional
knee-jerk reaction to anyone doing computer analysis of names or trying to
record gender in any way other than simply asking people.  :)

-- 
Russ Allbery (rra@debian.org)              <https://www.eyrie.org/~eagle/>


Reply to: