[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#820119: tidy reports valid NCR as invalid



Laura asked for my help on this issue.  What I found is that setting the
environment variable SP_CHARSET_FIXED to 1 makes the onsgmls program use
the Unicode 2.0 character set, as the referenced web page says. 
However, it uses only the first 65536 characters (the iso10646-ucs-2
character set), so character number 128513 triggers the error since it
is outside that range.  In order to make that work, you need to ensure
SP_CHARSET_FIXED is unset in the validate script.  However, XML files
need SP_CHARSET_FIXED set.  So, I suggest something like this (patch
attached):

    if ($xhtml{$htmlLevel}) {
        $ENV{'SGML_CATALOG_FILES'} = $xhtmlCatalog;
    $ENV{'SP_CHARSET_FIXED'} = 1;
        $ENV{'SP_ENCODING'} = 'xml';
    } else {
        $ENV{'SGML_CATALOG_FILES'} = $htmlCatalog;
        if (defined $charset) {
            $ENV{'SP_BCTF'} = $charset;
        } else {
            $ENV{'SP_BCTF'} = "utf-8";
        }
    }

That also changes the default character set for HTML from ISO-8859-1 to
UTF-8 because the former is not a valid BCTF option.  It appears the
validate script only uses that default if there is not a character set
defined in the HTML file itself and there is no character set option
passed to the script.

I didn't set up the whole web site build on my machine to test if this
change has any negative effects on pages other than en_GB.it.html , so
it needs broader testing.


diff --git a/scripts/validate b/scripts/validate
index 7d20f1c..a41c1cb 100755
--- a/scripts/validate
+++ b/scripts/validate
@@ -364,16 +364,16 @@ foreach $file (@files) {
     # environment accordingly.
     if ($xhtml{$htmlLevel}) {
         $ENV{'SGML_CATALOG_FILES'} = $xhtmlCatalog;
+	$ENV{'SP_CHARSET_FIXED'} = 1;
         $ENV{'SP_ENCODING'} = 'xml';
     } else {
         $ENV{'SGML_CATALOG_FILES'} = $htmlCatalog;
         if (defined $charset) {
-            $ENV{'SP_ENCODING'} = $charset;
+            $ENV{'SP_BCTF'} = $charset;
         } else {
-            $ENV{'SP_ENCODING'} = "ISO-8859-1";
+            $ENV{'SP_BCTF'} = "utf-8";
         }
     }
-    $ENV{'SP_CHARSET_FIXED'} = 1;
 
     if ($verbose) {
         if ($file eq '-') {

Reply to: