[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Bug#668556: ITP: dparser -- a scannerless GLR parser generator



* Markus Wanner <markus@bluegap.ch> [120414 13:32]:
> On 04/14/2012 11:22 AM, Jakub Wilk wrote:
> > Sure, they are also much more common than GLR. And if you are "just"
> > interested in parsing and not a computer scientists, there's a chance
> > you've never heard about any of them.
>
> Based on two votes for extending the acronyms, I propose to change the
> long description as follows:
>
>  DParser is a scannerless, generalized left-to-right, rightmost
>  deviation (GLR) parser generator based on the Tomita algorithm. It is
>  self-hosted and very easy to use. Grammars are written in a natural
>  style of extended Backus-Naur form (EBNF) and regular expressions and
>  support both speculative and final actions.
>
> I'm not a native speaker, so please feel free to comment on spelling,
> grammar, comma or other errors.

I'm not sure that is that much more understandable for people not
knowing the concepts or for people trying to understand what this
package does and why some other admin installed it (or why a user
requests its installation).

Let's take a look at start of bison's description:
 Bison is a general-purpose parser generator that converts a
 grammar description for an LALR(1) context-free grammar into a C
 program to parse that grammar.

That sentence also describes what a parser generator is (so even
someone not so deep into computer science knows what this package
is about), while still containing the details and not being much longer.

I personally cannot take much more out of "generalized
left-to-right, rightmost deviation (GLR) parser generator" than
out of "GLR parser", I'd need to look it up anyway, so "GLR parser
generator" is quite well in my eyes.

For the grammer I personally would prefer it expanded, though
I think it is more understandable as "EBNF (extended Backus-Naur form)
Grammer" than the other way around.

Other questions I ask myself when looking at the description:

What is "based on the Tomita algorithm" about? Wikipedia tells me
GLR parser generators are based on work of Tomita, so is that a
description of "GLR parser" or is it a somehow important implementation
detail that this is based on the original and not some reinvention
or totally different algorithm one would also call GLR parser?
Would that information be important for anyone to decide if they
want to install that package or not?

What is "a natural style of EBNF" supposed to mean? Are there any
unnatural styles of EBNF? Does it mean I can write something I
would be able to identify as EBNF in contrast to other parser
generators? As EBNF is already about forms of expression grammars,
what is meant here that not having that could still be called EBNF?
That might be only my ignorance, so my question: "Does this
'natural style of' give anyone a information they would want when
determining if installing or uninstalling this package?"

What is "scannerless"? After considering that "does not contain a
scanner" does not make much sense I guess it means I do not need
something like flex run first to translate the input into tokens?
Perhaps that can also expressed more generally understandable.

If all my guesses above are right, that would result in something
like:

 DParser is a scannerless GLR parser generator that converts
 a grammar given in EBNF (extended Backus-Naur form) and
 regular expressions into <whatever this translates to> to parse
 that grammer without needing an extra scanner to tokenize the
 input.

(With perhaps "GLR" being replaced with "GLR (generalized left-to-right,
rightmost deviation)" and perhaps "to tokenize the input" removed).


Reply to: