Bug#724115: hunspell: FTBFS: POD error
On Tue, Nov 12, 2013 at 09:43:59PM +0100, Rene Engelhard wrote:
> On Tue, Nov 12, 2013 at 07:54:04PM +0100, Agustin Martin wrote:
> > I will have a look at this (I once wrote ispellaff2myspell). Now I think the
> > best is to change script to UTF8, but keep strings in code as escaped octal.
> > Or rewrite that part.
> >
> > Let me think about this. Hope to find time tomorrow.
>
> Oops, too late. Just added the patch as I saw the patch and did it before
> starting to read mail. My bad.
>
> Feel free to come up with a patch based on -5 and I'll happily add it, though.
Hi, Rene and Gregor
Attached in two forms. One simple, just to see the differences I added and
the good one with all trailing whitespace in ispellaff2myspell trimmed.
Minimally tested with the faroese dictionary.
I also looked at myspell-tools. If I find time I will also prepare a patch
for myspell-tools also including changes by Gregor. I see that
ispellaff2myspell is included through a dpatch patch. Do you think it would
be interesting to change handling to something closer to what is used for
hunspell-tools (plain file under debian/)?
Regards,
--
Agustin
diff --git a/debian/changelog b/debian/changelog
index 2ca1fbe..0572e6c 100644
--- a/debian/changelog
+++ b/debian/changelog
@@ -1,3 +1,13 @@
+hunspell (1.3.2-6) unstable; urgency=low
+
+ * debian/ispellaff2myspell: New upstream version.
+ - Incorporate changes by Gregor Herrmann (UTF-8 and typo fixes).
+ - Use octal codes for unibyte strings to make them coexist
+ with new UTF-8 encoding.
+ - Other minor changes.
+
+ --
+
hunspell (1.3.2-5) unstable; urgency=low
* apply patch from Gregor Hermann, thanks
diff --git a/debian/ispellaff2myspell b/debian/ispellaff2myspell
index 692571c..940d82b 100644
--- a/debian/ispellaff2myspell
+++ b/debian/ispellaff2myspell
@@ -1,8 +1,7 @@
#!/usr/bin/perl -w
-# -*- coding: iso-8859-1 -*-
-# $Id: ispellaff2myspell,v 1.29 2005/07/04 12:21:55 agmartin Exp $
+# -*- coding: utf-8 -*-
#
-# (C) 2002-2005 Agustin Martin Domingo <agustin.martin@hispalinux.es>
+# (C) 2002-2013 Agustin Martin Domingo <agustin.martin@hispalinux.es>
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
@@ -21,7 +20,7 @@
sub usage {
print "ispellaff2myspell: A program to convert ispell affix tables to myspell format
-(C) 2002-2005 Agustin Martin Domingo <agustin.martin\@hispalinux.es> License: GPL
+(C) 2002-2013 Agustin Martin Domingo <agustin.martin\@hispalinux.es> License: GPL2+
Usage:
ispellaff2myspell [options] <affixfile>
@@ -98,17 +97,17 @@ sub mylc{
}
} else {
if ( $charset eq "latin0" ){
- $lowercase='a-zàáâãäåæçèéêëìíîïðñòóôõöøùúûüýþ½¨¸';
- $uppercase='A-ZÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝÞ¼¦´';
+ $lowercase='a-z\340\341\342\343\344\345\346\347\350\351\352\353\354\355\356\357\360\361\362\363\364\365\366\370\371\372\373\374\375\376\275\250\270';
+ $uppercase='A-Z\300\301\302\303\304\305\306\307\310\311\312\313\314\315\316\317\320\321\322\323\324\325\326\330\331\332\333\334\335\336\274\246\264';
} elsif ( $charset eq "latin1" ){
- $lowercase='a-zàáâãäåæçèéêëìíîïðñòóôõöøùúûüýþ';
- $uppercase='A-ZÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝÞ';
+ $lowercase='a-z\340\341\342\343\344\345\346\347\350\351\352\353\354\355\356\357\360\361\362\363\364\365\366\370\371\372\373\374\375\376';
+ $uppercase='A-Z\300\301\302\303\304\305\306\307\310\311\312\313\314\315\316\317\320\321\322\323\324\325\326\330\331\332\333\334\335\336';
} elsif ( $charset eq "latin2" ){
- $lowercase='a-z±³µ¶¹º»¼¾¿àáâãäåæçèéêëìíîïðñòóôõöøùúûüýþ';
- $uppercase='A-Z¡£¥¦©ª«¬®¯ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝÞ';
+ $lowercase='a-z\261\263\265\266\271\272\273\274\276\277\340\341\342\343\344\345\346\347\350\351\352\353\354\355\356\357\360\361\362\363\364\365\366\370\371\372\373\374\375\376';
+ $uppercase='A-Z\241\243\245\246\251\252\253\254\256\257\300\301\302\303\304\305\306\307\310\311\312\313\314\315\316\317\320\321\322\323\324\325\326\330\331\332\333\334\335\336';
} elsif ( $charset eq "latin3" ){
- $lowercase='a-z±¶¹º»¼¿àáâäåæçèéêëìíîïñòóôõö÷øùúûüýþ';
- $uppercase='A-Z¡¦©ª«¬¯ÀÁÂÄÅÆÇÈÉÊËÌÍÎÏÑÒÓÔÕÖ×ØÙÚÛÜÝÞ';
+ $lowercase='a-z\261\266\271\272\273\274\277\340\341\342\344\345\346\347\350\351\352\353\354\355\356\357\361\362\363\364\365\366\367\370\371\372\373\374\375\376';
+ $uppercase='A-Z\241\246\251\252\253\254\257\300\301\302\304\305\306\307\310\311\312\313\314\315\316\317\321\322\323\324\325\326\327\330\331\332\333\334\335\336';
# } elsif ( $charset eq "other_charset" ){
# die "latin2 still unimplemented";
} else {
@@ -440,13 +439,19 @@ requires B<--lowercase> having exactly that string but lowercase.
=back
-If your encoding is currently unsupported you can send me a file with
-the two strings of lower and uppercase chars. Note that they must match
-exactly but case changed. It will look something like
+If your encoding is currently unsupported you can send me a separate file
+with the two strings of lower and uppercase chars. Note that they must
+match exactly but case changed. It will look something like
$lowercase='a-zàáâãäåæçèéêëìíîïðñòóôõöøùúûüýþ';
$uppercase='A-ZÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝÞ';
+A safer alternative against accidental recoding is to use octal codes for
+non 7bit chars. Above strings would then look like
+
+ $lowercase='a-z\340\341\342\343\344\345\346\347\350\351\352\353\354\355\356\357\360\361\362\363\364\365\366\370\371\372\373\374\375\376';
+ $uppercase='A-Z\300\301\302\303\304\305\306\307\310\311\312\313\314\315\316\317\320\321\322\323\324\325\326\330\331\332\333\334\335\336';
+
=head1 SEE ALSO
The OpenOffice.org Lingucomponent Project home page
diff --git a/debian/changelog b/debian/changelog
index 2ca1fbe..0572e6c 100644
--- a/debian/changelog
+++ b/debian/changelog
@@ -1,3 +1,13 @@
+hunspell (1.3.2-6) unstable; urgency=low
+
+ * debian/ispellaff2myspell: New upstream version.
+ - Incorporate changes by Gregor Herrmann (UTF-8 and typo fixes).
+ - Use octal codes for unibyte strings to make them coexist
+ with new UTF-8 encoding.
+ - Other minor changes.
+
+ --
+
hunspell (1.3.2-5) unstable; urgency=low
* apply patch from Gregor Hermann, thanks
diff --git a/debian/ispellaff2myspell b/debian/ispellaff2myspell
index 692571c..216ec75 100644
--- a/debian/ispellaff2myspell
+++ b/debian/ispellaff2myspell
@@ -1,9 +1,8 @@
#!/usr/bin/perl -w
-# -*- coding: iso-8859-1 -*-
-# $Id: ispellaff2myspell,v 1.29 2005/07/04 12:21:55 agmartin Exp $
-#
-# (C) 2002-2005 Agustin Martin Domingo <agustin.martin@hispalinux.es>
-#
+# -*- coding: utf-8 -*-
+#
+# (C) 2002-2013 Agustin Martin Domingo <agustin.martin@hispalinux.es>
+#
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 2 of the License, or
@@ -21,23 +20,23 @@
sub usage {
print "ispellaff2myspell: A program to convert ispell affix tables to myspell format
-(C) 2002-2005 Agustin Martin Domingo <agustin.martin\@hispalinux.es> License: GPL
+(C) 2002-2013 Agustin Martin Domingo <agustin.martin\@hispalinux.es> License: GPL2+
Usage:
ispellaff2myspell [options] <affixfile>
Options:
--affixfile=s Affix file
- --bylocale Use current locale setup for upper/lowercase
+ --bylocale Use current locale setup for upper/lowercase
conversion
- --charset=s Use specified charset for upper/lowercase
+ --charset=s Use specified charset for upper/lowercase
conversion (defaults to latin1)
--debug Print debugging info
--extraflags Allow some non alphabetic flags
--lowercase=s Lowercase string
--myheader=s Header file
- --printcomments Print commented lines in output
- --replacements=s Replacements file
+ --printcomments Print commented lines in output
+ --replacements=s Replacements file
--split=i Split flags with more that i entries
--uppercase=s Uppercase string
--wordlist=s Still unused
@@ -62,7 +61,7 @@ sub debugprint {
sub shipoutflag{
my $flag_entries=scalar @flag_array;
-
+
if ( $flag_entries != 0 ){
if ( $split ){
while ( @flag_array ){
@@ -92,23 +91,23 @@ sub mylc{
my $outputstring;
if ( $bylocale ){
- {
+ {
use locale;
$outputstring = lc $inputstring;
}
} else {
if ( $charset eq "latin0" ){
- $lowercase='a-zàáâãäåæçèéêëìíîïðñòóôõöøùúûüýþ½¨¸';
- $uppercase='A-ZÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝÞ¼¦´';
+ $lowercase='a-z\340\341\342\343\344\345\346\347\350\351\352\353\354\355\356\357\360\361\362\363\364\365\366\370\371\372\373\374\375\376\275\250\270';
+ $uppercase='A-Z\300\301\302\303\304\305\306\307\310\311\312\313\314\315\316\317\320\321\322\323\324\325\326\330\331\332\333\334\335\336\274\246\264';
} elsif ( $charset eq "latin1" ){
- $lowercase='a-zàáâãäåæçèéêëìíîïðñòóôõöøùúûüýþ';
- $uppercase='A-ZÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝÞ';
+ $lowercase='a-z\340\341\342\343\344\345\346\347\350\351\352\353\354\355\356\357\360\361\362\363\364\365\366\370\371\372\373\374\375\376';
+ $uppercase='A-Z\300\301\302\303\304\305\306\307\310\311\312\313\314\315\316\317\320\321\322\323\324\325\326\330\331\332\333\334\335\336';
} elsif ( $charset eq "latin2" ){
- $lowercase='a-z±³µ¶¹º»¼¾¿àáâãäåæçèéêëìíîïðñòóôõöøùúûüýþ';
- $uppercase='A-Z¡£¥¦©ª«¬®¯ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝÞ';
+ $lowercase='a-z\261\263\265\266\271\272\273\274\276\277\340\341\342\343\344\345\346\347\350\351\352\353\354\355\356\357\360\361\362\363\364\365\366\370\371\372\373\374\375\376';
+ $uppercase='A-Z\241\243\245\246\251\252\253\254\256\257\300\301\302\303\304\305\306\307\310\311\312\313\314\315\316\317\320\321\322\323\324\325\326\330\331\332\333\334\335\336';
} elsif ( $charset eq "latin3" ){
- $lowercase='a-z±¶¹º»¼¿àáâäåæçèéêëìíîïñòóôõö÷øùúûüýþ';
- $uppercase='A-Z¡¦©ª«¬¯ÀÁÂÄÅÆÇÈÉÊËÌÍÎÏÑÒÓÔÕÖ×ØÙÚÛÜÝÞ';
+ $lowercase='a-z\261\266\271\272\273\274\277\340\341\342\344\345\346\347\350\351\352\353\354\355\356\357\361\362\363\364\365\366\367\370\371\372\373\374\375\376';
+ $uppercase='A-Z\241\246\251\252\253\254\257\300\301\302\304\305\306\307\310\311\312\313\314\315\316\317\321\322\323\324\325\326\327\330\331\332\333\334\335\336';
# } elsif ( $charset eq "other_charset" ){
# die "latin2 still unimplemented";
} else {
@@ -116,7 +115,7 @@ sub mylc{
die "Unsupported charset [$charset]
use explicitly --lowercase=string and --uppercase=string
-options. Remember that both string must match exactly, but
+options. Remember that both string must match exactly, but
case changed.
";
}
@@ -136,17 +135,17 @@ sub validate_flag (){
if ($flag =~ m/^$_/){
$flag =~ s/^$_//;
return $flag;
- }
+ }
}
- }
+ }
return '';
}
sub process_replacements{
my $file = shift;
my @replaces = ();
-
- open (REPLACE,"< $file") ||
+
+ open (REPLACE,"< $file") ||
die "Error: Could not open replacements file: $file\n";
while (<REPLACE>){
next unless m/^REP[\s\t]*\D.*/;
@@ -178,7 +177,7 @@ $debug = '';
$lowercase = '';
$myheader = '';
$printcomments = '';
-$replacements = '';
+$replacements = '';
$split = '';
$uppercase = '';
$wordlist = '';
@@ -218,7 +217,7 @@ if ( not $affixfile ){
if ( $charset and ( $lowercase or $uppercase )){
die "Error: charset and lowercase/uppercase options
-are incompatible. Use either charset or lowercase/uppercase options to
+are incompatible. Use either charset or lowercase/uppercase options to
specify the patterns
"
} elsif ( not $lowercase and not $uppercase and not $charset ){
@@ -231,7 +230,7 @@ if ( scalar(keys %theextraflags) == 0 && $hasextraflags ){
debugprint "$affixfile $charset";
-open (AFFIXFILE,"< $affixfile") ||
+open (AFFIXFILE,"< $affixfile") ||
die "Error: Could not open affix file: $affixfile";
if ( $myheader ){
@@ -259,7 +258,7 @@ while (<AFFIXFILE>){
s/^[\s\t]*flag[\s\t]*//;
s/[\s\t]*:.*$//;
debugprint "Found flag $_ in line $.\n";
-
+
if (/\*/){
s/[\*\s]//g;
$flagcombine="Y";
@@ -267,7 +266,7 @@ while (<AFFIXFILE>){
} else {
$flagcombine="N";
}
-
+
if ( $flagname = &validate_flag($_) ){
$myaffix = $affix;
} else {
@@ -278,11 +277,11 @@ while (<AFFIXFILE>){
} elsif ( $affix and $inflags ) {
($rootname,@comments) = split('#',$_);
$comment = '# ' . join('#',@comments);
-
+
$rootname =~ s/\s*//g;
$rootname = mylc $rootname;
($rootname,$addtoroot) = split('>',$rootname);
-
+
if ( $addtoroot =~ s/^\-//g ){
($rootremove,$addtoroot) = split(',',$addtoroot);
$addtoroot = "0" unless $addtoroot;
@@ -295,15 +294,15 @@ while (<AFFIXFILE>){
if ( $rootname eq '.' && $rootremove ne "0" ){
$rootname = $rootremove;
}
-
+
debugprint "$rootname, $addtoroot, $rootremove\n";
if ( $printcomments ){
$affix_line=sprintf("%s %s %-5s %-11s %-24s %s",
- $myaffix, $flagname, $rootremove,
+ $myaffix, $flagname, $rootremove,
$addtoroot, $rootname, $comment);
} else {
$affix_line=sprintf("%s %s %-5s %-11s %s",
- $myaffix, $flagname, $rootremove,
+ $myaffix, $flagname, $rootremove,
$addtoroot, $rootname);
}
$rootremove = "0";
@@ -340,23 +339,23 @@ B<ispellaff2myspell> - A program to convert ispell affix tables to myspell forma
Options:
--affixfile=s Affix file
- --bylocale Use current locale setup for upper/lowercase
+ --bylocale Use current locale setup for upper/lowercase
conversion
- --charset=s Use specified charset for upper/lowercase
+ --charset=s Use specified charset for upper/lowercase
conversion (defaults to latin1)
--debug Print debugging info
--extraflags=s Allow some non alphabetic flags
--lowercase=s Lowercase string
- --myheader=s Header file
- --printcomments Print commented lines in output
- --replacements=s Replacements file
+ --myheader=s Header file
+ --printcomments Print commented lines in output
+ --replacements=s Replacements file
--split=i Split flags with more that i entries
--uppercase=s Uppercase string
=head1 DESCRIPTION
-B<ispellaff2myspell> is a script that will convert ispell affix tables
-to myspell format in a more or less successful way.
+B<ispellaff2myspell> is a script that will convert ispell affix tables
+to myspell format in a more or less successful way.
This script does not create the dict file. Something like
@@ -368,85 +367,91 @@ should do the work, with mydict.words+ being the munched wordlist
=over 8
-=item B<--affixfile=s>
+=item B<--affixfile=s>
Affix file. You can put it directly in the command line.
-=item B<--bylocale>
+=item B<--bylocale>
-Use current locale setup for upper/lowercase conversion. Make sure
-that the selected locale match the dictionary one, or you might get
+Use current locale setup for upper/lowercase conversion. Make sure
+that the selected locale match the dictionary one, or you might get
into trouble.
-=item B<--charset=s>
+=item B<--charset=s>
-Use specified charset for upper/lowercase conversion (defaults to latin1).
+Use specified charset for upper/lowercase conversion (defaults to latin1).
Currently allowed values for charset are: latin0, latin1, latin2, latin3.
-=item B<--debug>
+=item B<--debug>
Print some debugging info.
-=item B<--extraflags:s>
+=item B<--extraflags:s>
-Allows some non alphabetic flags.
+Allows some non alphabetic flags.
-When invoked with no value the supported flags are currently those
-corresponding to chars represented with the escape char B<\> as
+When invoked with no value the supported flags are currently those
+corresponding to chars represented with the escape char B<\> as
first char. B<\> will be stripped.
-When given with the flag prefix will allow that flag and strip the
-given prefix. Be careful when giving the prefix to properly escape chars,
-e.g. you will need B<-e "\\\\"> or B<-e '\\'> for flags like B<\[> to be stripped to
-B<[>. Otherwise you might even get errors. Use B<-e "^"> to allow all
+When given with the flag prefix will allow that flag and strip the
+given prefix. Be careful when giving the prefix to properly escape chars,
+e.g. you will need B<-e "\\\\"> or B<-e '\\'> for flags like B<\[> to be stripped to
+B<[>. Otherwise you might even get errors. Use B<-e "^"> to allow all
flags and pass them unmodified.
-You will need a call to -e for each flag type, e.g.,
-B<-e "\\\\" -e "~\\\\"> (or B<-e '\\' -e '~\\'>).
+You will need a call to -e for each flag type, e.g.,
+B<-e "\\\\" -e "~\\\\"> (or B<-e '\\' -e '~\\'>).
-When a prefix is explicitly set, the default value (anything starting by B<\>)
+When a prefix is explicitly set, the default value (anything starting by B<\>)
is disabled and you need to enable it explicitly as in previous example.
-=item B<--lowercase=s>
+=item B<--lowercase=s>
-Lowercase string. Manually set the string of lowercase chars. This
+Lowercase string. Manually set the string of lowercase chars. This
requires B<--uppercase> having exactly that string but uppercase.
-
-=item B<--myheader=s>
-Header file. The myspell aff header. You need to write it
+=item B<--myheader=s>
+
+Header file. The myspell aff header. You need to write it
manually. This can contain everything you want to be before the affix table
-=item B<--printcomments>
+=item B<--printcomments>
Print commented lines in output.
-=item B<--replacements=file>
+=item B<--replacements=file>
Add a pre-defined replacements table taken from 'file' to the .aff file.
Will skip lines not beginning with REP, and set the replacements number
appropriately.
-=item B<--split=i>
+=item B<--split=i>
-Split flags with more that i entries. This can be of interest for flags
-having a lot of entries. Will split the flag in chunks containing B<i>
+Split flags with more that i entries. This can be of interest for flags
+having a lot of entries. Will split the flag in chunks containing B<i>
entries.
-=item B<--uppercase=s>
+=item B<--uppercase=s>
-Uppercase string. Manually set the sring of uppercase chars. This
+Uppercase string. Manually set the sring of uppercase chars. This
requires B<--lowercase> having exactly that string but lowercase.
=back
-If your encoding is currently unsupported you can send me a file with
-the two strings of lower and uppercase chars. Note that they must match
-exactly but case changed. It will look something like
+If your encoding is currently unsupported you can send me a separate file
+with the two strings of lower and uppercase chars. Note that they must
+match exactly but case changed. It will look something like
$lowercase='a-zàáâãäåæçèéêëìíîïðñòóôõöøùúûüýþ';
$uppercase='A-ZÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝÞ';
+A safer alternative against accidental recoding is to use octal codes for
+non 7bit chars. Above strings would then look like
+
+ $lowercase='a-z\340\341\342\343\344\345\346\347\350\351\352\353\354\355\356\357\360\361\362\363\364\365\366\370\371\372\373\374\375\376';
+ $uppercase='A-Z\300\301\302\303\304\305\306\307\310\311\312\313\314\315\316\317\320\321\322\323\324\325\326\330\331\332\333\334\335\336';
+
=head1 SEE ALSO
The OpenOffice.org Lingucomponent Project home page
@@ -459,7 +464,7 @@ L<http://lingucomponent.openoffice.org/affix.readme>
that provides information about the basics of the myspell affix file format.
-You can also take a look at
+You can also take a look at
/usr/share/doc/libmyspell-dev/affix.readme.gz
/usr/share/doc/libmyspell-dev/README.compoundwords
Reply to: