[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Checking for unbalanced parentheses, quotes, ...



Hi all,

I recently searched for unbalanced parentheses, braces, quotes, ... and
was already able to fix a few (not many) in English and German.

I performed also more checks and fixed many wrong URLs and other stuff.

I want to commit the following, can someone please check it?

Index: english/misc/children-distros.wml
===================================================================
RCS file: /cvs/webwml/webwml/english/misc/children-distros.wml,v
retrieving revision 1.31
diff -u -r1.31 children-distros.wml
--- english/misc/children-distros.wml	18 Feb 2004 03:53:19 -0000	1.31
+++ english/misc/children-distros.wml	14 Mar 2004 22:36:48 -0000
@@ -230,10 +230,10 @@
 <P>Linex has drawn a lot of attention not just from non-Spanish newspapers
 like the 
 <a href="http://www.washingtonpost.com/ac2/wp-dyn?pagename=article&amp;node=&amp;contentId=A59197-2002Nov2&amp;notFound=true";>Washington Post</A>,
-but from the even the European Parliament, where it was presented in
+but even from the European Parliament, where it was presented in
 several occasions, the latest one in the annual meeting of the
-European Regions Comitee (news posted in  February 2003
-(<a href="http://www.consultia.net/esnoticia/vernoticia.asp?id=4489&amp;seccion=2";>
^^^ this parenthesis is wrong

+European Regions Comitee (news posted in February 2003
+<a href="http://www.consultia.net/esnoticia/vernoticia.asp?id=4489&amp;seccion=2";>\
 here</A> or 
 <a href="http://www.elperiodicoextremadura.com/noticias/noticia.asp?pkid=40043";>here</A>,
 in Spanish only).

I suggest that you perform this check on other translations as well (it's
very fast!):
(You can obtain ./pattern-match from http://alioth.debian.org/ ==> Code
Snippets ==> File Management, I suggest to remove LaTeX specific pattern
in it and to add language specific stuff, such as French quotes ">>", "<<")

cd webwml/<lang>
find -type f ! -name "*.jpeg" ! -name "*.png" ! -name "*.jpg" ! \
-name "*.ico" ! -name "*.gif" > /tmp/filelist

for f in $(cat /tmp/filelist); do 
  echo $f >> /tmp/done
  cp $f /tmp/file
  echo file=$f
  input=y
  while [ x"$input" = x"y" ]; do
    if [ $(cat /tmp/file | sed 's/:-)/xxx/;s/:)/xx/' | ./pattern-match | \
           tee /tmp/output | wc -l) -eq 0 ] ;
    then 
      input=n
    else
      cat /tmp/output; echo "check this file again? (y/n)"; read input
    fi
  done
done

You should edit the temporary /tmp/file in another xterm to remove bogus
data (comments, smileys, ...) and restart the test on this file. This
mostly removes all problems, otherwise, when you really found a wrong
pattern, edit the file mentioned in the output ("file=" line).

The script produces the following output:

...
file=./CD/vendors/index.wml
file=./CD/vendors/info.wml
file=./CD/vendors/legal.wml
Wrong pattern ) (no corresponding left pattern) in line 35, column 9
Wrong pattern ) (no corresponding left pattern) in line 47, column 9
Wrong pattern ) (no corresponding left pattern) in line 65, column 9
Wrong pattern ) (no corresponding left pattern) in line 72, column 56
check this file again? (y/n)

(You will notice that these warnings are related to enumerations a), b),
...; the script cannot recognize these. Change these to a, b, ... in
/tmp/file, restart using pressing "y" Enter and the script continues.)

Most of the wrong pattern I found were already fixed in at least one
translation. TRANSLATORS PLEASE FIX ENGLISH FILES AS WELL WHEN YOU FIND
ERRORS IN IT. If you are unsure contact me.
You should not count parentheses in smileys: use "(text :-))" instead of
"(text :-)" to reduce trouble.

It would be nice if you could also call

webwml/polish$ grep -ri '\.<[^>]\{0,10\}>\.' .
./CD/faq/index.wml:  image...</i>. Pó¼niej wybierz rozszerzenie ,,.iso'' i po¿±dany
./CD/artwork/index.wml:href="$(HOME)/logos/">dostêpne w ró¿nych wersjach.</a>.</p>
[snip]

As you can see this matches many double full stops (".</a>.")
(It's too much for a single person to fix.)

PS: I *strongly* suggest that you call grep on all found typos to fix all
occurrences. I revised the German files this way and was already able
to fix thousands of typos. This includes fixes for "Debain", "DSFG" and
many more.

Jens



Reply to: