[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: [translate-pootle] New backend API



Hello,

> I'll just comment loosely on a few things as I notice them or think
> about them. Forgive me if my comments are not quite in the direction
> or on the detail level you were asking for :-)

Your comments are always very useful.  A new revision of the API is
attached.

> ILanguageInfo: ISO639 is not enough - we need to include the
> possibility of ISO3166 country codes, although they shouldn't be
> mandatory.

Done.

> Please explain what you mean with a 'module'. To use gettext
> terminology, does it correspond to a PO file in GNOME, or to a
> directory of files, or to a PO domain?

I renamed it to a UnitCollection.

> For ISuggestionList you can look at the existing
> lookupserver/lookupclient implementation - it is still very basic,
> and I have some uncommited work there, but is a start and already
> exists.

In this case I meant more of a Rosetta-like suggestion, i.e., an
unapproved translation.  I think that the suggestion model you speak
about should live in a different part of the system.

> In terms of the statistics, I don't know if we necessarily need
> separate types for a project and for language. We need to determine
> the needs, but it might be simpler to have one type.

I gave it a little thought.  Couldn't think of anything special, so I
collapsed it for now.  Thanks for the suggestion.

> IProject: we might need information like accelerators, etc. - what we
> currently call checkerstyle. Probably just a checkerstyle, although we
> might want to define an interface for that as well, some day in the
> future. We might want to store optional version control information.

Yep, I added checkers in a low-tech way, as a set of string ids.

> I would suggest aligning the terminology (and API in general) with
> what we have in the base classes (which was based on XLIFF, as I
> recall), so for example rather 'unit' than 'message' or
> 'translation'. 

Done, thanks.  I consider this very important.

> In terms of data, we'll probably need fairly rich ways of supporting
> comments, context, states (fuzzy, needs-review, etc.), formats
> (c-format, etc.). There are _lots_ of stuff in XLIFF, we can't
> realistically support all of it immediately. But perhaps we should at
> least support most of these that I mention. People will probably
> disagree about what is important, but this list is a start. On the
> other hand, we want to work towards handling process information,
> etc. so we'll probably need a lot more.

Agreed.  We can extend the interfaces later.

Actually I was thinking of having an "annotations" style attribute on
most objects so that arbitrary data could be put in there.  It would be
best to minimize the amount of data put in there, because it's better
to have things declared explicitly in the interface, then their
semantics are clear, and they can be stored sanely in rich formats like
XLIFF.  Still, such an attribute might prove useful for storing things
such as translation owner, etc. used by other subsystems.

I would imagine a dict {string->string} on the implementation level;
that should be easy to store on most backend formats (RDB, .po. XLIFF)
without much fuss.

> In terms of actions, we'll need methods for pushing updates,
> specifying which actions to take and in what way (join, overwrite,
> overwrite if empty, ignore, turn into suggestion, etc.).  We'll
> probably want some way to trigger an action, like updating from
> version control. We'll probably need some authentication system,
> although this whole area probably needs far more consideration.

I would prefer to keep this storage layer dumb.  Of course we will need
authentication, merging, etc., but I think that these can be split
off into separate components.

At this moment authentication worries me a bit.  A lot of things can be
just postponed until they are needed, but I have some tough experiences
with security tacked on after the fact ;)  It can probably wait just a
little bit more though.

> > I wanted to use Zope interfaces for declaring the API, but decided
> > that it may not be worth it here to add another dependency.

The more I go into this, the more I want an interfaces package.  If we
want a modular system, we definitely want interfaces. zope.interfaces is
relatively standard, not bound to Zope in any way. AFAIK it's used in
Twisted and a lot of other projects.  The disadvantage is that it
contains C files which would require a C compiler.  For now we can
probably live with the current hacked-up style, but a long-term
solution would be nice.

> > I ran into two design problems here.  I think that they would hold
> > for any API, not just the one I sketched, so please bear with me :)
> > 
> > 1) how to add a new item to a container, let's say, a new module to
> > a language translation set.  I see two ways:
> > * use a special factory class (Abstract Factory pattern) that
> > builds the needed objects, then add them (I prefer this)
> > * have each container implement the add() method so that it
> > instantiates an empty item, adds it and returns it.  The new empty
> > item can then be updated with the required data.  This works a bit
> > like the Prototype pattern.
> > 
> I don't quite see the advantage of the first approach, since I don't
> foresee a complex requirements for item creation.  For
> base.TranslationStore, we already have addsourceunit(source), so
> perhaps we can use something similar unless there is need for
> something else.

The problem here is that I want to generalize manipulation of all
containers, and this problem recurs in several places.  In case of
pofile you use the UnitClass attribute that points to the class of
the children.  Something similar could work here too I guess.

> > 2) when to save data.  Again, several choices:
> > * straight-through: always carry out the operation at once.  Grossly
> > inefficient for strings (imagine adding strings to a module one by
> > one), but might work for higher-level containers
> > * completely explicit: serialization happens when you explicitly
> > call a method save().  This is prone to bugs and not very nice
> > design: it may break the abstraction.
> > * transactional: when you modify an object, it marks itself as
> > "dirty".  The Pootle main function calls "db.startTransaction()" at
> > the beginning of processing a request and calls
> > "db.endTransaction()" at the end.  endTransaction() would collect
> > the "dirty" objects and write them to disk.  I like this one best,
> > as it leaves it to the implementation of the API how to efficiently
> > deal with changes.
> 
> The third approach might have been necessary if we had big data
> dependencies, but it might be overkill. Then again, I guess we can
> implement something simple within that API.  I'll let others comment
> on this more. I don't see why the second is necessarily that bad, but
> we'll discuss this more later.

I guess you are right.  I'm currently leaning towards this: all actions
are performed immediately with explicit exceptions (such as editing
unit collections and individual translation units).  There will probably
be a few more exceptions.  What do you think about this approach?

I also consulted your wiki document on base classes.  I am not
completely convinced that we need multifiles and the distinction
between multistrings and translation units at this stage.  Do you think
that we can get by for now without these, or that they should be
introduced into the API early on?  Your call here ;)

Best regards,
-- 
Gintautas Miliauskas
http://gintasm.blogspot.com
"""Abstract classes (interfaces) that define the Pootle backend API.

These classes only describe the available operations.  New backends should
implement operations described here.

Fields marked as optional can have the value None.

You can use the function validateModule to check that a set of backend classes
implements the interfaces described here.
"""

# === Helper objects ===

class Interface(object): pass

# Fields
class String(Interface): pass
class Unicode(Interface): pass
class Integer(Interface): pass
class Date(Interface): pass


# === API interfaces ===

class IHaveStatistics(Interface):

    def statistics(self):
        """Return statistics for this object."""
        return IStatistics


class IStatistics(Interface):
    """Statistics."""

    total_strings = Integer
    translated_strings = Integer
    fuzzy_strings = Integer
    # TODO: untranslated, but suggested? other?


class IDatabase(IHaveStatistics):

    def keys(self):
        """Get list of available project keys."""
        return [Unicode]

    def __getitem__(self, projectid):
        """Get project object by project id."""
        return IProject

    def add(self, projectid):
        """Add a new project."""
        return IProject


class IProject(IHaveStatistics):
    """An object corresponding to a project.

    This loosely corresponds to a set of .po files for some project.

    A project may have translations to a number of languages, each translation
    divided into unit collections divided into translation units.
    """

    db = IDatabase
    name = Unicode # project name
    description = Unicode # project description (unwrapped)
    checker = [String] # A list of string identifiers for checkers
    # TODO Maybe checkers should belong to unit collections instead?
    template = Interface # IUnitCollection without the actual translations
    # TODO: Have a link to the project's ViewVC page so that we can produce
    #       direct hyperlinks to unit context in the source code.

    def keys(self):
        """Get list of available language codes."""
        return [String]

    def __getitem__(self, code):
        """Get language object by language code."""
        return ILanguage

    def add(self, langname):
        """Add a new language."""
        return ILanguage

    def statistics(self):
        return IStatistics


class ILanguageInfo(Interface):
   """General information about a language."""

   # TODO: Specify if this object could/should be shared between projects.

   code = String # ISO639 language code - optional
   country = String # optional - ISO3166 two-letter country code
   name = Unicode # complete language name (native)
   name_eng = Unicode # complete language name in English; optional TODO needed?
   specialchars = [Unicode] # list of special chars
   nplurals = Integer
   pluralequation = String # optional


class ILanguage(IHaveStatistics):
    """A set of translations for modules of a given project in some language."""

    project = IProject
    languageinfo = ILanguageInfo

    def keys(self):
        """Return list of unit collection ids."""
        return [String]

    def __getitem__(self, name):
        """Get unit collection by id."""
        return IUnitCollection

    def add(self, name):
        """Add a new module."""
        return IUnitCollection

    def statistics(self):
        return IStatistics


class IUnitCollection(IHaveStatistics):
    """A collection of translation units

    This loosely corresponds to a .po file.

    Note that the internal container of translation units is not exposed
    directly so that the implementation can accurately track all changes.

    For efficiency reasons modifications are not recorded immediately.
    Call save() explicitly when you are done modifying the data.
    """

    name = Unicode # the id for this collection
    language = ILanguage

    def __iter__(self):
        """Return an iterable of translation units."""
        return iter(ITranslationUnit)

    def __getitem__(self, number):
        """Get translation by index (starting from 0)."""
        return IMessage

    def __getslice__(self, start, end):
        """Return a half-open range (by number) of translations.

        This allows slice-notation: collection[0:5] would get the first 5
        units.
        """

    def fill(self, units):
        """Clear and import all units from the given iterable."""

    def clear(self):
        """Clear all units from this collection."""

    def save(self):
        """Dump the current contents of this collection to storage."""

    def statistics(self):
        """Return module statistics."""
        return ICollectionStatistics


class ISuggestion(Interface):
    """A suggestion for a particular message.

    The intention of this class is to store an unapproved string, possibly
    submitted by an irregular or even unregistered translator.  The interface
    should offer a convenient way of "upgrading" suggestions to translations.
    """

    unit = ITranslationUnit
    date = Date # submission date


class ITranslationUnit(Interface):
    """A single translatable string."""

    collection = IUnitCollection
    suggestions = [ISuggestion]
    context = String # context information

    # Comments: optional; can be multiline, but should be whitespace-stripped
    translator_comments = Unicode
    automatic_comments = Unicode
    reference = Unicode # TODO Should we be smarter here?

    flags = set(String) # fuzzy, c-format, no-c-format
    # rather low-tech, but I see little wins in using real objects here.

    key = Unicode
    translation = Unicode # optional
    plurals = [Unicode] # optional


# === Validation helpers ===

# TODO: I'm reinventing the wheel here, poorly.  I would love to grab a
# real interface package such as zope.interfaces, but that would be an
# additional dependency.

class ImplementationError(Exception):
    pass


def validateClass(cls, iface):
    """Validate a given class against an interface."""
    for attrname, attr in iface.__dict__:
        if attrname.startswith('__'):
            continue # ignore internal attributes

        # Check for existence of the attribute
        try:
            real_attr = getattr(cls, attrname)
        except AttributeError:
            raise ImplementationError('%r does not have %r' % (cls, attrname))

        if isinstance(attr, type) and issubclass(attr, Interface): # attribute
            pass
        elif callable(attr): # method
            if not callable(real_attr):
                raise ImplementationError('%r of %r is not callable'
                                          % (attrname, cls))
            # TODO check signature of callable?
        else:
            raise AssertionError("shouldn't happen")


def validateModule(module, complete=False):
    """Check classes in a module against interfaces.

    The classes to be checked should have the atttribute _interface
    pointing to the implemented interface.
    """
    ifaces = set()
    for name, cls in module.__dict__:
        if isinstance(cls, type):
            iface = getattr(cls, '_interface', None)
            if iface is not None:
                validateClass(cls, iface)
                ifaces.add(iface)

    if complete:
        pass # TODO: check if all interfaces were implemented at least once?

Attachment: signature.asc
Description: PGP signature


Reply to: