[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Checking for unbalanced parentheses, quotes, ...

Hi all,

I recently searched for unbalanced parentheses, braces, quotes, ... and
was already able to fix a few (not many) in English and German.

I performed also more checks and fixed many wrong URLs and other stuff.

I want to commit the following, can someone please check it?

Index: english/misc/children-distros.wml
RCS file: /cvs/webwml/webwml/english/misc/children-distros.wml,v
retrieving revision 1.31
diff -u -r1.31 children-distros.wml
--- english/misc/children-distros.wml	18 Feb 2004 03:53:19 -0000	1.31
+++ english/misc/children-distros.wml	14 Mar 2004 22:36:48 -0000
@@ -230,10 +230,10 @@
 <P>Linex has drawn a lot of attention not just from non-Spanish newspapers
 like the 
 <a href="http://www.washingtonpost.com/ac2/wp-dyn?pagename=article&amp;node=&amp;contentId=A59197-2002Nov2&amp;notFound=true";>Washington Post</A>,
-but from the even the European Parliament, where it was presented in
+but even from the European Parliament, where it was presented in
 several occasions, the latest one in the annual meeting of the
-European Regions Comitee (news posted in  February 2003
-(<a href="http://www.consultia.net/esnoticia/vernoticia.asp?id=4489&amp;seccion=2";>
^^^ this parenthesis is wrong

+European Regions Comitee (news posted in February 2003
+<a href="http://www.consultia.net/esnoticia/vernoticia.asp?id=4489&amp;seccion=2";>\
 here</A> or 
 <a href="http://www.elperiodicoextremadura.com/noticias/noticia.asp?pkid=40043";>here</A>,
 in Spanish only).

I suggest that you perform this check on other translations as well (it's
very fast!):
(You can obtain ./pattern-match from http://alioth.debian.org/ ==> Code
Snippets ==> File Management, I suggest to remove LaTeX specific pattern
in it and to add language specific stuff, such as French quotes ">>", "<<")

cd webwml/<lang>
find -type f ! -name "*.jpeg" ! -name "*.png" ! -name "*.jpg" ! \
-name "*.ico" ! -name "*.gif" > /tmp/filelist

for f in $(cat /tmp/filelist); do 
  echo $f >> /tmp/done
  cp $f /tmp/file
  echo file=$f
  while [ x"$input" = x"y" ]; do
    if [ $(cat /tmp/file | sed 's/:-)/xxx/;s/:)/xx/' | ./pattern-match | \
           tee /tmp/output | wc -l) -eq 0 ] ;
      cat /tmp/output; echo "check this file again? (y/n)"; read input

You should edit the temporary /tmp/file in another xterm to remove bogus
data (comments, smileys, ...) and restart the test on this file. This
mostly removes all problems, otherwise, when you really found a wrong
pattern, edit the file mentioned in the output ("file=" line).

The script produces the following output:

Wrong pattern ) (no corresponding left pattern) in line 35, column 9
Wrong pattern ) (no corresponding left pattern) in line 47, column 9
Wrong pattern ) (no corresponding left pattern) in line 65, column 9
Wrong pattern ) (no corresponding left pattern) in line 72, column 56
check this file again? (y/n)

(You will notice that these warnings are related to enumerations a), b),
...; the script cannot recognize these. Change these to a, b, ... in
/tmp/file, restart using pressing "y" Enter and the script continues.)

Most of the wrong pattern I found were already fixed in at least one
ERRORS IN IT. If you are unsure contact me.
You should not count parentheses in smileys: use "(text :-))" instead of
"(text :-)" to reduce trouble.

It would be nice if you could also call

webwml/polish$ grep -ri '\.<[^>]\{0,10\}>\.' .
./CD/faq/index.wml:  image...</i>. Pó¼niej wybierz rozszerzenie ,,.iso'' i po¿±dany
./CD/artwork/index.wml:href="$(HOME)/logos/">dostêpne w ró¿nych wersjach.</a>.</p>

As you can see this matches many double full stops (".</a>.")
(It's too much for a single person to fix.)

PS: I *strongly* suggest that you call grep on all found typos to fix all
occurrences. I revised the German files this way and was already able
to fix thousands of typos. This includes fixes for "Debain", "DSFG" and
many more.


Reply to: