Hi there!
While looking for possible sponsors for DebConf13[1], I wrote a
quick&dirty shell script (attached, but read [2]) to extract some
information, specifically who sent the request and for which
organization.
[1] <http://lists.debian.org/87objissel.fsf@gismo.pca.it>
[2] I know that a better solution would have been to understand&reuse
the WML infrastructure (these files are already parsed to generate
the correct index), but I did not have the time for that, sorry.
I thus discovered some discrepancies:
- the line containing the contact information is not "standard",
i.e. not always "# From: NAME <EMAIL>". Moreover, some names were not
completely "standard" either, e.g. lowercase letters or extra
quotes[3].
- some files contain HTML-encoded accented characters, while others not,
which sounded strange given the README[4] that states:
Each file in these directories will create a link from the /users/ page,
showing the content of the <pagetitle> tag. BE CAREFUL - the <pagetitle>
is added verbatim, that means it MUST NOT contain any 8bit characters (in
the english tree) because these titles are put into the translated pages
when there is no translation of the file itself and create wrong
characters.
AGAIN: DO NOT put any 8BIT CHARACTERS into the <pagetitle>.
This was even more strange to me since Debian is UTF-8-aware since a
while and the migration to UTF-8 for the website was completed [5].
[3] I know this could sound nitpicking, but for automatic parsing (and
consistency) I consider it a bug.
[4] <http://anonscm.debian.org/viewvc/webwml/webwml/english/users/README?revision=1.4&view=markup>
[5] <http://bugs.debian.org/567781>
Two examples:
--8<---------------cut here---------------start------------->8---
Index: com/alcove.wml
===================================================================
RCS file: /cvs/webwml/webwml/english/users/com/alcove.wml,v
retrieving revision 1.2
diff -u -r1.2 alcove.wml
--- com/alcove.wml 10 Sep 2007 07:38:07 -0000 1.2
+++ com/alcove.wml 19 Nov 2012 20:37:37 -0000
@@ -1,12 +1,12 @@
# From: Yann Dirson <ydirson@fr.alcove.com>
-<define-tag pagetitle>Alcôve, France</define-tag>
+<define-tag pagetitle>Alcôve, France</define-tag>
<define-tag webpage>http://www.alcove.com/</define-tag>
#use wml::debian::users
<p>
- Here at Alcôve, we use Debian for all of our infrastructure and
+ Here at Alcôve, we use Debian for all of our infrastructure and
development workstations, totalling over 30 machines. We also
recommend Debian to our customers for most situations, although we
also install other distributions if they so desire.
Index: edu/unieconomicspoznan.wml
===================================================================
RCS file: /cvs/webwml/webwml/english/users/edu/unieconomicspoznan.wml,v
retrieving revision 1.2
diff -u -r1.2 unieconomicspoznan.wml
--- edu/unieconomicspoznan.wml 26 May 2011 10:05:50 -0000 1.2
+++ edu/unieconomicspoznan.wml 19 Nov 2012 20:37:37 -0000
@@ -1,4 +1,4 @@
-# Maciej So³tysiak <maciej.soltysiak@ae.poznan.pl>
+# From: Maciej Sołtysiak <maciej.soltysiak@ae.poznan.pl>
<define-tag pagetitle>University of Economics in Poznan, Poland</define-tag>
<define-tag webpage>http://www.ae.poznan.pl/</define-tag>
--8<---------------cut here---------------end--------------->8---
Given that I have anyway corrected all the entries for the DebConf
sponsors-table, I was wondering if we would like to apply them, which
also means that the README[3] file is to be corrected. Obviously, any
error generated from such actions would be mine ;-)
NB, I have not checked languages other than English nor tried to rebuild
the full website. But given that the migration to UTF-8 is
completed[4], I would be surprised if the above changes will
generate any error.
Comments?
Thx, bye,
Gismo / Luca
#!/bin/sh
#
# extract-debian-users.sh, extract information from webwml files used
# to build www.debian.org/users/ available at
# <http://anonscm.debian.org/viewvc/webwml/webwml/english/users/>
# Copyright (C) 2012 Luca Capello <luca@pca.it>
# Version:
# 2012-11-19: 0.1
set -e
if [ -z "$1" ]; then
echo "Usage: $0 directory [committer]"
exit 1
elif [ ! -d "$1" ]; then
echo "$1 is not a directory"
exit 2
else
# remove tralinig '/'
DIRECTORY=$(echo "$1" | sed -e 's/\/$//')
fi
if [ -n "$2" ]; then
COMMITTER="$2"
else
COMMITTER="$USER"
fi
# description of the output
cat <<EOF
From <http://www.debian.org/users/$(basename $DIRECTORY)>
======================================
EOF
DATE=$(date +%Y-%m-%d)
for I in $DIRECTORY/*.wml; do
FROM=$(grep "^# From:" "$I")
CONTACT=$(echo "$FROM" | sed -e 's/\(.*\)<//' -e 's/>\(.*\)//')
PERSON=$(echo "$FROM" | sed -e 's/\(.*\)://' -e 's/<\(.*\)//' -e 's/^ //' -e 's/ $//')
TITLE=$(grep "pagetitle>" "$I" | sed -e 's/\(.*\)pagetitle>//' -e 's/<\/define\(.*\)//')
WEBSITE=$(grep "webpage>" "$I" | sed -e 's/\(.*\)webpage>//' -e 's/<\/define\(.*\)//')
LINK=$(echo "$I" | sed -e 's/\(.*\)users\///' -e 's/\.wml//')
cat <<EOF
:$TITLE
$COMMITTER: $DATE: contact is $CONTACT
person is $PERSON
website is $WEBSITE
source is http://www.debian.org/users/$LINK
EOF
done
Attachment:
pgpTzSwPGJSO_.pgp
Description: PGP signature