--- Begin Message ---
- To: Debian Bug Tracking System <submit@bugs.debian.org>
- Subject: bibtex2html: Accents lexing/parsing
- From: Samuel Colin <samuel.colin@loria.fr>
- Date: Fri, 22 Feb 2008 22:18:24 +0100
- Message-id: <20080222211824.30243.97020.reportbug@hebus>
Package: bibtex2html
Version: 1.91-1
Severity: minor
Hi,
in my use of bib2bib I discovered that the õ character was not handled. Thus
I added it to latex_accents.mll.
I also made the following changes to it:
- Other latin-1 diacritics (Ç, Ã, etc)
- I removed the "\\I" "letters": to my knowledge only \i exists so as to
remove the point above the "i". No need of a \I as it already lacks this
point
- I added "\\i}" because it was not able to handle entries like:
author = {Col{\"\i}n},
for instance. The first "{" is taken by next_char but once "\\"" has been
lexed quote_char does not know about "\\i}", hence my addition
- I also added the "{I}" char
I hoped I did not misinterpret the inner workings of latex_accents.mll, see
the attached diff.
On that note, I also discovered that fields like:
author = {Tr{\" e}ma and Cl{\' e}s},
were not correctly matched by a regex condition. One of the cause seems to
come from the fact that latex_accents.mll does not take inner spaces into
account. Other experiments seem to also suggest something in condition_lexer
and/or bibtex_lexer, although I'm far from sure.
I got very confused between the OCaml escapings of characters, the escapings
I had to do in my shell and the escapings in the regex, and all the lexers,
thus I will not attempt to touch it and trust upstream here :-)
-- System Information:
Debian Release: lenny/sid
APT prefers testing
APT policy: (500, 'testing')
Architecture: i386 (i686)
Kernel: Linux 2.6.22
Locale: LANG=fr_FR.UTF-8, LC_CTYPE=fr_FR.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/bash
Versions of packages bibtex2html depends on:
ii ocaml-base-nox [ocaml-base-no 3.10.0-13 Runtime system for ocaml bytecode
ii perl 5.8.8-12 Larry Wall's Practical Extraction
ii texlive-base 2007-13 TeX Live: Essential programs and f
bibtex2html recommends no packages.
-- no debconf information
--- latex_accents.mll.backup 2008-02-22 19:09:59.000000000 +0100
+++ latex_accents.mll 2008-02-22 20:03:46.000000000 +0100
@@ -37,7 +37,13 @@
| '{' { next_char lexbuf }
| '}' { next_char lexbuf }
| 'ç' { add_string "ç" ; next_char lexbuf }
+ | 'Ç' { add_string "Ç" ; next_char lexbuf }
| 'ñ' { add_string "ñ"; next_char lexbuf }
+ | 'Ñ' { add_string "Ñ"; next_char lexbuf }
+ | 'ã' { add_string "ã"; next_char lexbuf }
+ | 'Ã' { add_string "Ã"; next_char lexbuf }
+ | 'õ' { add_string "õ"; next_char lexbuf }
+ | 'Õ' { add_string "Õ"; next_char lexbuf }
| 'ä' { add_string "ä"; next_char lexbuf }
| 'ö' { add_string "ö"; next_char lexbuf }
| 'ü' { add_string "ü"; next_char lexbuf }
@@ -90,25 +96,27 @@
| '`' { left_accent lexbuf }
| '^' { hat lexbuf }
| "c{c}" { add_string "ç" ; next_char lexbuf }
+| "c{C}" { add_string "Ç" ; next_char lexbuf }
| 'v' { czech lexbuf }
-| ("~n"|"~{n}") { add_string "ñ"; next_char lexbuf }
+| '~' { tilde lexbuf }
| _ { add_string "\\" ; add lexbuf ; next_char lexbuf }
| eof { add_string "\\" }
(* called when we have seen "\\\"" *)
and quote_char = parse
- ('a'|"{a}") { add_string "ä" ; next_char lexbuf }
-| ('o'|"{o}") { add_string "ö" ; next_char lexbuf }
-| ('u'|"{u}") { add_string "ü" ; next_char lexbuf }
-| ('e'|"{e}") { add_string "ë" ; next_char lexbuf }
-| ('A'|"{A}") { add_string "Ä" ; next_char lexbuf }
-| ('O'|"{O}") { add_string "Ö" ; next_char lexbuf }
-| ('U'|"{U}") { add_string "Ü" ; next_char lexbuf }
-| ('E'|"{E}") { add_string "Ë" ; next_char lexbuf }
-| ("\\i" space+|"{\\i}") { add_string "ï" ; next_char lexbuf }
-| ('I'|"\\I" space+|"{\\I}") { add_string "Ï" ; next_char lexbuf }
-| _ { add_string "\\\"" ; add lexbuf }
-| eof { add_string "\\\"" }
+ ('a'|"{a}") { add_string "ä" ; next_char lexbuf }
+| ('o'|"{o}") { add_string "ö" ; next_char lexbuf }
+| ('u'|"{u}") { add_string "ü" ; next_char lexbuf }
+| ('e'|"{e}") { add_string "ë" ; next_char lexbuf }
+| ('A'|"{A}") { add_string "Ä" ; next_char lexbuf }
+| ('O'|"{O}") { add_string "Ö" ; next_char lexbuf }
+| ('U'|"{U}") { add_string "Ü" ; next_char lexbuf }
+| ('E'|"{E}") { add_string "Ë" ; next_char lexbuf }
+| ('i'|"{i}"|"\\i" space+|"{\\i}"|"\\i}")
+ { add_string "ï" ; next_char lexbuf }
+| ('I'|"{I}") { add_string "Ï" ; next_char lexbuf }
+| _ { add_string "\\\"" ; add lexbuf }
+| eof { add_string "\\\"" }
(* called when we have seen "\\'" *)
and right_accent = parse
@@ -120,9 +128,10 @@
| ('O'|"{O}") { add_string "Ó" ; next_char lexbuf }
| ('U'|"{U}") { add_string "Ú" ; next_char lexbuf }
| ('E'|"{E}") { add_string "É" ; next_char lexbuf }
-| ('\'') { add_string "”" ; next_char lexbuf }
-| ('i'|"\\i" space+|"{\\i}") { add_string "í" ; next_char lexbuf }
-| ('I'|"\\I" space+|"{\\I}") { add_string "Í" ; next_char lexbuf }
+| ('\'') { add_string "”" ; next_char lexbuf }
+| ('i'|"{i}"|"\\i" space+|"{\\i}"|"\\i}")
+ { add_string "í" ; next_char lexbuf }
+| ('I'|"{I}") { add_string "Í" ; next_char lexbuf }
| _ { add_string "\\'" ; add lexbuf ; next_char lexbuf }
| eof { add_string "\\'" }
@@ -136,12 +145,14 @@
| ('O'|"{O}") { add_string "Ò" ; next_char lexbuf }
| ('U'|"{U}") { add_string "Ù" ; next_char lexbuf }
| ('E'|"{E}") { add_string "È" ; next_char lexbuf }
-| ('`') { add_string "“" ; next_char lexbuf }
-| ('i'|"\\i" space+ |"{\\i}") { add_string "ì" ; next_char lexbuf }
-| ('I'|"\\I" space+ |"{\\I}") { add_string "Ì" ; next_char lexbuf }
+| ('`') { add_string "“" ; next_char lexbuf }
+| ('i'|"{i}"|"\\i" space+ |"{\\i}"|"\\i}")
+ { add_string "ì" ; next_char lexbuf }
+| ('I'|"{I}") { add_string "Ì" ; next_char lexbuf }
| _ { add_string "\\`" ; add lexbuf ; next_char lexbuf }
| eof { add_string "\\`" }
+(* called when we have seen "\\^" *)
and hat = parse
('a'|"{a}") { add_string "â" ; next_char lexbuf }
| ('o'|"{o}") { add_string "ô" ; next_char lexbuf }
@@ -151,18 +162,32 @@
| ('O'|"{O}") { add_string "Ô" ; next_char lexbuf }
| ('U'|"{U}") { add_string "Û" ; next_char lexbuf }
| ('E'|"{E}") { add_string "Ê" ; next_char lexbuf }
-| ('i'|"\\i" space+ |"{\\i}") { add_string "î" ; next_char lexbuf }
-| ('I'|"\\I" space+ |"{\\I}") { add_string "Î" ; next_char lexbuf }
+| ('i'|"{i}"|"\\i" space+ |"{\\i}"|"\\i}")
+ { add_string "î" ; next_char lexbuf }
+| ('I'|"{I}") { add_string "Î" ; next_char lexbuf }
| _ { add_string "\\^" ; add lexbuf ; next_char lexbuf }
| eof { add_string "\\^" }
+(* called when we have seen "\\~" *)
+and tilde = parse
+ ('a'|"{a}") { add_string "ã" ; next_char lexbuf }
+| ('o'|"{o}") { add_string "õ" ; next_char lexbuf }
+| ('A'|"{A}") { add_string "Ã" ; next_char lexbuf }
+| ('O'|"{O}") { add_string "Õ" ; next_char lexbuf }
+| ('n'|"{n}") { add_string "ñ" ; next_char lexbuf }
+| ('N'|"{N}") { add_string "Ñ" ; next_char lexbuf }
+| _ { add_string "\\~" ; add lexbuf ; next_char lexbuf }
+| eof { add_string "\\~" }
+
+(* called when we have seen "\\v" *)
and czech = parse
('r'|"{r}") { add_string "ř" ; next_char lexbuf }
| ('R'|"{R}") { add_string "Ř" ; next_char lexbuf }
| ('s'|"{s}") { add_string "š" ; next_char lexbuf }
| ('S'|"{S}") { add_string "Š" ; next_char lexbuf }
-| ('i'|"\\i" space+ |"{\\i}") { add_string "ĭ" ; next_char lexbuf }
-| ('I'|"\\I" space+ |"{\\I}") { add_string "Ĭ" ; next_char lexbuf }
+| ('i'|"{i}"|"\\i" space+ |"{\\i}"|"\\i}")
+ { add_string "ĭ" ; next_char lexbuf }
+| ('I'|"{I}") { add_string "Ĭ" ; next_char lexbuf }
| _ { add_string "\\^" ; add lexbuf ; next_char lexbuf }
| eof { add_string "\\^" }
--- End Message ---
--- Begin Message ---
Source: bibtex2html
Source-Version: 1.92-1
We believe that the bug you reported is fixed in the latest version of
bibtex2html, which is due to be installed in the Debian FTP archive:
bibtex2html_1.92-1.diff.gz
to pool/main/b/bibtex2html/bibtex2html_1.92-1.diff.gz
bibtex2html_1.92-1.dsc
to pool/main/b/bibtex2html/bibtex2html_1.92-1.dsc
bibtex2html_1.92-1_all.deb
to pool/main/b/bibtex2html/bibtex2html_1.92-1_all.deb
bibtex2html_1.92.orig.tar.gz
to pool/main/b/bibtex2html/bibtex2html_1.92.orig.tar.gz
A summary of the changes between this version and the previous one is
attached.
Thank you for reporting the bug, which will now be closed. If you
have further comments please address them to 467082@bugs.debian.org,
and the maintainer will reopen the bug report if appropriate.
Debian distribution maintenance software
pp.
Ralf Treinen <treinen@debian.org> (supplier of updated bibtex2html package)
(This message was generated automatically at their request; if you
believe that there is a problem with it please contact the archive
administrators by mailing ftpmaster@debian.org)
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Format: 1.8
Date: Tue, 12 Aug 2008 01:43:30 +0200
Source: bibtex2html
Binary: bibtex2html
Architecture: source all
Version: 1.92-1
Distribution: unstable
Urgency: low
Maintainer: Debian OCaml Maintainers <debian-ocaml-maint@lists.debian.org>
Changed-By: Ralf Treinen <treinen@debian.org>
Description:
bibtex2html - filters BibTeX files and translates them to HTML
Closes: 467082
Changes:
bibtex2html (1.92-1) unstable; urgency=low
.
* New upstream version. This release fixes a bug with accent parsing
and conversion (closes: Bug#467082).
* Adapted patch 03_charset to new upstream version.
* Standards-Version 3.8.0 (no change).
Checksums-Sha1:
74f3f3f8c5cf159ea2641af084624386458e56b4 1511 bibtex2html_1.92-1.dsc
37b95ed2d9427f0289939d46af6839453db60794 69800 bibtex2html_1.92.orig.tar.gz
beaeff49cf9c8c732ed811587618c2924309bd37 11709 bibtex2html_1.92-1.diff.gz
7c0cf5734293d946807cba5d6c65c0eab35ef7c4 135772 bibtex2html_1.92-1_all.deb
Checksums-Sha256:
935bcedb8f6ca00e1f3e79a6824891a45900c694c7bbf1090084fbc8bc76c2aa 1511 bibtex2html_1.92-1.dsc
3410acb7c01871a48fb4b483a3d93ade49e7fde2ce6d2c19daa3733c734caaea 69800 bibtex2html_1.92.orig.tar.gz
32ef2f635c3a36ea705cafe2b08258e611fa20f1692888503ef1cfaad0d7d6c5 11709 bibtex2html_1.92-1.diff.gz
d5709fee96f43eaf97e51b1d46514f2003439787fd69c463501d25f1f612e011 135772 bibtex2html_1.92-1_all.deb
Files:
3d25a0a26813dc11f60bd55a6d58f99e 1511 tex optional bibtex2html_1.92-1.dsc
9d69980f595be02a79a96a851d79bb88 69800 tex optional bibtex2html_1.92.orig.tar.gz
736bc45e0bb5e60fae66fe80255a0521 11709 tex optional bibtex2html_1.92-1.diff.gz
7164f919a7f48894c1c1abc5eec4149e 135772 tex optional bibtex2html_1.92-1_all.deb
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)
iD8DBQFIqzzttzWmSeC6BMERApzLAJ9Rx+35YbWJpl4OebrKU7BQ8ELBzQCg+G59
ymVQzKkCw7WjXS2Mpvfp7aA=
=gjTF
-----END PGP SIGNATURE-----
--- End Message ---