[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: translations of documentation

Adam Di Carlo <apharris@burrito.onshore.com> writes:
> > CVS provides this as long as individual files aren't too long. If
> > the files are too long translators need to waste a lot of time
> > finding the areas that changes have been made.
> Hmm.  Better solution is formenting in our collective minds... stay
> tuned...

I investigated some of the things that were concerning me, mainly Jim
Clark's SP package and it's ability to handle multiple character

> One of my friends, who happens to know much more about SGML than I,
> Craig Brozefsky, said that he would present here a little plan for how
> we can take of all these issues using some nice interactions between
> SGML and CVS and some architectural aspects of SP.  I've talked to him
> about it tonight and we seem to have a basic idea which shouldn't be
> too hard to implement.

I guess we should start with what I see as the requirements for a
system to help people maintain multi-language documents, which may
have different maintainers for each language, and share common
components like graphics and tables:

1. Track the structure of the document in a manner that is language

2. Work with CVS and the basic distributed developer model which has
   several working copies checked out at once, each developer having a
   large amount of autonomy, and no specialized storage mechanism.	
3. Be DTD independent, to allow translation of documents in various
   DTDs and not just debiandoc.

4. Function independent of the document processing system, or the
   editing tools used by the developers.

5. Easy to learn and configure so that it can be used by developers
   who may not be skilled in the intracacies of SGML and possibly
   using a WSIWYG editor.  It should not be a barrier to someone
   contributing to the task of translation.

I am assuming that we will be using different files, or sets of files,
for each translation of the document, and that there is a "master"
version of the document which is the target the other translations
attempt to achieve parity with.  This "master" document requirement
may limit our process of developing documents somewhat, but I think it
fits the real world model of how these documents are being created and
it gives us a very easy model for tracking structural parity between

Assuming we have a master document A, and a translation of it,
document B, we can compare the language independent structure of the
documents by running them both thru an SGML parser and doing the
equivalent of the "diff" between their LI (language independent)
components.  The LI portion of a document is defined by a the set of
SGM elements that are structural in the DTD being used, like CHAPTER
and SECT, and possibly P.  It also includes those elements which are
floating, do not have make up the skeleton of the document but which
remain the same between languages, such as REF for hyperlinking, FIG
for images, and TABLE tags.  Structural elements have to appear in a
particular place in relation to other structural elements, but
floating elements can appear in any order with their containing
structural element.

An example would be:

<sect id=blah>
	<chapter id=chp2>
		<p>Some verbage</p>

Let's assume that we have define SECT and CHAPTER as structural
elements, and TABLE and P as floating elements.  Then any translation
would have to have a SECT, containing a CHAPTER.  that CHAPTER must
have appearing in it somewhere a single paragraph P, and a single
TABLE.  The contents of the paragraph and the table are unspecified.

In order to define what elements of a DTD are structural and what
parts are elements weuse a definition file.  There is a definition
ile for each DTD, but any document can extend or modify the definition
file for it's DTD.  Let's imagine it looks something like this:

(li-structural-elements "SECT" "CHAPTER")

(li-floating-element "TABLE"
	:langauge '("english" "french" "spanish")
	:cardinality t)		
(li-floating-element "P"
	:language ''("english" "french" "spanish")
	:cardinality f)

This would say that SECT and CHAPTER are structural elements
obviously.  There is not much mroe to track about structural elements
since they are probably the most LI parts of the document, and change
very little.  It also define TABLE as a floating element that we
should check for when our language is enlighs french or spanish, but
otherwise don't complain if it's not there, and that we should track
it's cardinality, or how many times it appears within it's containing
structural element.  P is much the same way, but we don't pay
attention to it's cardinality since we just sorta are assuming that it
indicates some verbage which may be broken up different across
languages due to phrasing or idiom differences.

So the check that we have synoptic parity between translations, a
document runs a script which looks at the translation in question,
finds it's master document and it's LI struture definition, compares
the two and notifies the documenter of any differences.  The
documenter is allowed to check the translation in still, and we place
no real limits on the rest of their work flow.  This is because we
assume that they know best, and that even a incomplete translation is
better than none at all.

The default is to check against the most recent version of the master
document, but the documenter can specify that the check be made
against an older version.  Version IDs are your run of the mill CVS
version, or tag specifier.

The tool could also produce templates, which have the LI structure of
the master document, so that translators have a easier time targeting
the master.  It would not containing floating elements, just
structural ones.

As far as implementation goes, I have several toolkits in scheme which
would facilitate this, included a recursive descent parser for doing
the structural tree comparisons, which comes from Joerg Wittenberger's
SDC package.  It also makes the LI definition for the DTD very easy to
represent and parse.  It would probably be in guile, tho bigloo has
gone to a DFSG compatible license with it's most recent release and it
compiles down to some pretty quick code.

Any comments?

Reply to: