Re: [Debconf-team] proposal for combining ranking factors for travel grants

To: David Bremner <david@tethera.net>, debconf team list <debconf-team@lists.debconf.org>
Subject: Re: [Debconf-team] proposal for combining ranking factors for travel grants
From: Gaudenz Steinlin <gaudenz@debian.org>
Date: Sun, 22 Feb 2015 21:42:21 +0100
Message-id: <[🔎] 87bnklel02.fsf@meteor.durcheinandertal.bofh>
In-reply-to: <[🔎] 87wq3bml6k.fsf@maritornes.cs.unb.ca>
References: <[🔎] 87wq3bml6k.fsf@maritornes.cs.unb.ca>

Hi David

Thanks for working on this. Your idea looks interesting. I still have
some questions on how this works. See below.

David Bremner <david@tethera.net> writes:

> Apologies in advance for the long email. If it's any consolation, it
> took even longer to write.
>
> Background
> ==========
>
> In previous meetings [1][2], an adhoc group of debconf people talked
> about the ranking process for travel grants. Our rough concensus was
> (plagiarizing myself):
>
>  1) Financial need was at least as important as contributions in ranking
>  people for travel sponsorship
>  2) Only financial need by itself is not sufficient for sponsorship.
>
> I propose formalising this as follows. Each applicant has a level of
> financial need (self assessment, see summit dropdown) and a score for
> contribution (subjective by the bursaries committee, hopefully following
> some to be decided guidelines; this score will be based on the
> "contributions to Debian" and "contributions to DebConf" fields in
> summit [3]).
>
> Ranking
> =======
>
> If 
>
>          (a.need >= b.need)  and (a.contribution > b.contribution)
>
> then it seems clear that 'a' should funded before 'b'. So rather than
> trying cleverly break ties in other cases, my proposal is to just apply
> apply this rule, to construct a directed graph with arcs a->b if a should
> be funded before 'b'.  We first fund all the people with no incoming
> arcs, remove them from the graph, and repeat [4].

I'm not sure if I interpret your PDF graph correct. Do you suggest that
everyone on the same line forms a cohort and should have the same
priority?

If so I don't understand how those that don't have any outgoing edges in
the (reduced) graph get there level. It looks like they are just moved
to the highest possible level. But is that really what we want?

Or to put it another way, why are number 29 and 36 ranked higher than 19
and 7. According to the ordering specified above there is no clear
justification for this.

It seems to me this problem arises because your graph drops edges over
more than one level. I'm not sure if this is really the case, but the
graph looks like results in contribution taking precedence over need. Is
that correct and intended?

Gaudenz

>
> Example data, script
> ====================
>
> This scheme doesn't lend itself well to implementing in a spreadsheet,
> so I wrote a simple python script to implement it.
>
> To see the prettier output, you can run
>
> python travel-rank.py --format=dot full.dat | dot -Tpdf > foo.pdf
>
> In case that is too much effort, I also attach the pdf file
> The data file is anonymised data from a previous year [1], to give some
> idea of how this would work in practice.  Since we did not have
> numbers for financial need that year, I essentially made them up
> (actually I computed them according to a silly heuristic, but you may as
> well assume they are random). The file format of "full.dat" is
>
> ident, need, score, requested_amount
>
> # Comments, Further improvements
>
> a) This is quite different than sorting lexicograpically; you can see
>    this because the test data is actually given in lexicographically
>    sorted order, first by need, then by contribution.
>
> b) There is no relationship between the two ranking scales; I haven't
>    played with a smaller scale for contributions, but as along as it
>    preserves the same ordering as in this data (i.e. we don't create
>    ties by rounding), the results should be the same.
>
> c) It would be reasonable to include lower thresholds for both values, either
>    as a preprocessing step or as part of the ranking script.
>
> d) the output graph includes incremental cost and running total for each
>    group
>
> e) The output graph leaves out many edges in order to increase
>    readability. Only edges from one level to the next are included,
>    since this is enough to enforce the ranking.
>
> f) You can, and probably should sanity check the graph against the test
>    data, by following a path from the top level. Bugs happen.
>
> g) this is all experimental, and even if we agree to follow this
>    approach, we should agree to trust the bursaries team to override the
>    results if they don't make sense.
>
> h) In case we need to fund a partial "cohort", we need some way to order
>    the cohort. We could sort that cohort according to contribution
>    score. E.g. if our budget is between 5000 and 6500, we would fund 19
>    but not 7. Or maybe a lottery, e.g. random permutation would be more
>    fair.
>    
> Congratulations. You either read the whole thing, or skipped to the end.
>
> [1]: http://meetbot.debian.net/debconf-team/2015/debconf-team.2015-01-08-20.01.html
> [2]: http://article.gmane.org/gmane.linux.debian.conference.team/12279
> [3]: This idea of combining the two fields subjectively is new in this
>      proposal, and comes out of working with cate to try and streamline the
>      application process.
> [4]: this is a kind of "Topological Sort", for those interested.
> [5]: in particular the scores don't match the amounts requested by
>      that person.
>
> #!/usr/bin/python
> # Copyright 2015 David Bremner <bremner@debian.org>
> # Licensed under the WTFPL v2
>
> import sys
> From decimal import *
> import argparse
> import copy
>
> class person:
>     def __init__(self, ident, need, score, cost):
>         self.ident = ident
>         self.need = need
>         self.score = score
>         self.cost = cost
>
>     # this function defines if 'self' should be definitely
>     # ranked after 'other'
>     
>     def __lt__(self, other):
>         return (self.need <= other.need) and \
>             (self.score < other.score)
>
> parser = argparse.ArgumentParser()
>
> parser.add_argument('--format', metavar = 'fmt',
>                     default = 'text',
>                     choices = ['text','dot'],
>                     help='{text, dot} (default text)')
>
> parser.add_argument('input', metavar = 'file',
>                     help = 'csv file')
>
> args=parser.parse_args()
>
> num_nodes = 0
>
> nodes = {}
>
> # input format is
> #   int, number, number, number
> with open(args.input,'r') as file:
>     for line in file:
>         num_nodes += 1
>         strs = line.split(',')
>         ident = int(strs[0])
>         numbers = [ Decimal(s) for s in strs[1:] ]
>         nodes[ident] = person(ident, *numbers)
>
> # we keep track of *inward* arcs, i.e. all nodes that
> # dominate a current node. 
> arcs = {}
>
> for ident in nodes.iterkeys():
>     arcs[ident] = set()
>     for other in nodes.iterkeys():
>         if nodes[ident] < nodes[other]:
>             arcs[ident].add(other)
>
> # we make copies here because our ranking algorithm is
> # destructive
>
> remaining = nodes.copy()
> parents = copy.deepcopy(arcs)
>
> levels = {}
> count = 0
>
> while len(remaining) != 0:
>
>     # find nodes (people) not dominated by any remaining nodes.
>     sources = set()
>     for node in remaining:
>         if len(parents[node]) == 0:
>             sources.add(node)
>
>     levels[count]=sources
>     count += 1;
>
>     # remove the most recently found set of sources from the
>     # graph, and update the parent information of other nodes
>     
>     for node in sources:
>         remaining.pop(node)
>         for other_node in remaining:
>             parents[other_node].discard(node)
>
> # From here on, it's all about generating pretty output
> total_cost = 0
> if args.format == 'dot':
>     print('digraph ranking {');
>     for index in levels.iterkeys():
>         level_cost=0
>         print('subgraph { rank = "same";')
>         for ident in levels[index]:
>             level_cost += nodes[ident].cost
>             print("{0:d};".format(ident))
>         total_cost += level_cost;
>         print 'level{0:d} [shape="record", label="{1:f}|{2:f}"];'.format(index,level_cost,total_cost)
>
>         print('}')
>         if index+1 in levels:
>             print ('level{0:d} -> level{1:d};'.format(index,index+1))
>             for current in levels[index]:
>                 for child in levels[index+1]:
>                     if current in arcs[child]:
>                         print("{0:d} -> {1:d};".format(current,child));
>
>
>     print('}');
>
> else:
>     # format == text
>     for index in levels.iterkeys():
>         print("{%s}\n" % ", ".join(str(x) for x in sorted(levels[index])))
> _______________________________________________
> Debconf-team mailing list
> Debconf-team@lists.debconf.org
> http://lists.debconf.org/mailman/listinfo/debconf-team

Reply to:

Follow-Ups:
- Re: [Debconf-team] proposal for combining ranking factors for travel grants
  - From: David Bremner <david@tethera.net>

References:
- [Debconf-team] proposal for combining ranking factors for travel grants
  - From: David Bremner <david@tethera.net>

Prev by Date: Re: [Debconf-team] Food [Re: Registration questions]
Next by Date: Re: [Debconf-team] Food [Re: Registration questions]
Previous by thread: [Debconf-team] proposal for combining ranking factors for travel grants
Next by thread: Re: [Debconf-team] proposal for combining ranking factors for travel grants
Index(es):
- Date
- Thread