[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Consistent formating long descriptions as input data



[ resent ]

On Thu, Apr 23, 2009 at 08:06:28PM -0500, Manoj Srivastava wrote:
> >> I suspect the answer might be to get a working implementation out
> >> in the wild (it does not have to be packages.d.o or anything
> >> official -- even a standalone software that takes the output from
> >> grep-dctrl or parses a Packages file will suffice)
> > Would you consider the tasks pages I announced yesterday [1] as
> > such an implementation.
>         Sure. It would be great to have another implementation, perhaps
>  one that people can play with (something that, for example, one can
>  pipe the output of a grep-dctrl command to, and get an html snippet
>  from (hey, that can then be packaged as an ikiwiki plugin).

Please find one attached as the script "render-dctrl". Sample usage:

  grep-available -s Package,Depends,Description ocaml | render-dctrl > packages.html    

Sample "packages.html" (obtained with the command above) is available
at [1]. To run it you will need python-debian and python-markdown. The
script will be shipped as an example of python-debian starting from
the next release [2].

I've on purpose not looked at Andreas implementation, in order to see
if we have mutually thought at different issues. That also means that
it can be utterly buggy, you have been warned :-)

Already in the attached sample output you can find what I believe will
be the most problematic issue to deal with (see the
libocamlbricks-ocaml-dev package). Namely: not-indented multi-line
list items which are not surrounded by blank lines. Arguably though, a
long description using a list as in the libocamlbricks-ocaml-dev
package is already "horrible" per se, and deserves to be fixed no
matter what. The interesting snippet is as follows:

> Library OCaml which provide a set of needed and useful macros for developing.
> Modules and functionality are the follows :
> .
>  - Configuration_files: Allow to get information from configuration files
>  - Environments: Environments are useful for maintaining the state, intendend
> as a set of bindings, of a user interaction with a GUI
>  - FilenameExtra: Additional features for the standard module Filename

Cheers.

[1] http://upsilon.cc/~zack/stuff/packages.html
[2] http://git.debian.org/?p=pkg-python-debian/python-debian.git;a=commit;h=b90ffafd6a1806ab7e3e7620d1675a53ae38e66e

PS mail like this one of mine should be better stored in the log of a
   bug report against the policy, to keep track of the
   status.
   Andreas: do we have one already? If not, can you please
   submit it with references to the thread? ... or else shout and
   I'll do that.

-- 
Stefano Zacchiroli -o- PhD in Computer Science \ PostDoc @ Univ. Paris 7
zack@{upsilon.cc,pps.jussieu.fr,debian.org} -<>- http://upsilon.cc/zack/
Dietro un grande uomo c'è ..|  .  |. Et ne m'en veux pas si je te tutoie
sempre uno zaino ...........| ..: |.... Je dis tu à tous ceux que j'aime
#!/usr/bin/python

# render-dctrl
# Copyright (C) 2009 Stefano Zacchiroli <zack@debian.org>
#
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.

# Requirements (Debian packages): python-debian python-markdown

usage = """Usage: render-dctrl [OPTION ...] [FILE ...]

Render a 822-like listing of Debian packages (AKA "Packages" file) to
XHTML, rendering (long) descriptions as Markdown text.  Render text
coming from FILEs, if given, or from standard input otherwise. Typical
usage is within a dctrl-tools pipeline, example:

  grep-available -s Package,Depends,Description ocaml | render-dctrl > foo.html

Warning: beware of #525525 and thus avoid using "-s Description" alone."""

import re
import string
import sys
from debian_bundle import deb822
from markdown import markdown
from optparse import OptionParser

options = None		# global, for cmdline options

css = """
body { font-family: sans-serif; }
dt {
  font-weight: bold;
}
dd {
  margin-bottom: 5pt;
}
div.package {
  border: solid 1pt;
  margin-top: 10pt;
  padding-left: 2pt;
  padding-right: 2pt;
}
.raw {
  font-family: monospace;
  background: #ddd;
  padding-left: 2pt;
  padding-right: 2pt;
}
.shortdesc {
  text-decoration: underline;
  margin-bottom: 5pt;
  display: block;
}
.longdesc {
  background: #eee;
}
span.package {
  font-family: monospace;
  font-size: 110%;
}
.uid {
  float: right;
  font-size: x-small;
  padding-right: 10pt;
}
"""
html_header = """<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
 "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd";>
<html xmlns="http://www.w3.org/1999/xhtml";>
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
    <style type="text/css">%s</style>
  </head>
  <body>
""" % css
html_trailer = """  </body>
</html>
"""

mdwn_list_line = re.compile(r'^(\s*)[\*\+\-]')	# Markdown list item line
# mdwn_head_line = re.compile(r'^(\s*)#')	# Markdown header
padding = re.compile(r'^(\s*)')

def get_indent(s):
    m = padding.match(s)
    if m:
        return len(m.group(1))
    else:
        return 0

def render_longdesc(lines):
    print '<div class="longdesc">'
    lines = map(lambda s: s[1:], lines)	# strip 822 heading space
    curpara, paragraphs = [], []
    inlist, listindent = False, 0
    store_para = lambda: paragraphs.append(string.join(curpara, '\n') + '\n')

    for l in lines:	# recognize Markdown paragraphs
        if l.rstrip() == '.':	# RULE 1: split paragraphs at Debian's "."
            store_para()            
            curpara, inlist, listindent = [], False, 0
        else:
            m = mdwn_list_line.match(l)
            if not inlist and m and curpara:
                # RULE 2: handle list item *not* at paragraph beginning
                store_para()	# => start a new paragraph
                curpara, inlist, listindent = [l], True, get_indent(l)
            elif inlist and get_indent(l) <= listindent:
                # RULE 3: leave list when indentation decreases
                store_para()	# => start a new paragraph
                curpara, inlist, listindent = [l], False, 0
            else:
                curpara.append(l)

    for p in paragraphs:	# render paragraphs
        print markdown(p)
    print '</div>'

def render_field(field, val):
    field = field.lower()
    print '<dt>%s</dt>' % field
    print '<dd class="%s">' % field
    if field == 'description':
        lines = val.split('\n')
        print '<span class="shortdesc">%s</span>' % lines[0]
        render_longdesc(lines[1:])
    elif field == 'package':
        print '<a href="#%s" class="uid">id</a>' % val
        print '<span id="%s" class="package">%s</span>' % (val, val)
    elif field in []:	# fields not to be typeset as "raw"
        print '<span class="%s">%s</span>' % (field, val)
    else:
        print '<span class="raw">%s</span>' % val
    print '</dd>'

def render_file(f):
    global options, html_header, html_trailer

    if options.print_header:
        print html_header
    for pkg in deb822.Packages.iter_paragraphs(f):
        print '<div class="package">'
        print '<dl class="fields">'
        for (field, val) in pkg.iteritems():
            render_field(field, val)
        print '</dl>'
        print '</div>\n'
    if options.print_header:
        print html_trailer

def main():
    global options, usage

    parser = OptionParser(usage=usage)
    parser.add_option("-n", "--no-headers",
                      action="store_false", dest="print_header", default=True,
                      help="suppress printing of HTML header/trailer")
    (options, args) = parser.parse_args()
    if len(args):
        for fname in args:
            render_file(open(fname))
    else:
        render_file(sys.stdin)

if __name__ == '__main__':
    main()

Attachment: signature.asc
Description: Digital signature


Reply to: