Re: What about DDTSS do you (dis)like?

To: debian-i18n@lists.debian.org
Subject: Re: What about DDTSS do you (dis)like?
From: Christian PERRIER <bubulle@debian.org>
Date: Sat, 11 Jun 2011 14:19:09 +0200
Message-id: <[🔎] 20110611121909.GS4261@mykerinos.kheops.frmug.org>
In-reply-to: <[🔎] BANLkTinu8XkvvSMDa23qqp4yjZmMP7gOFQ@mail.gmail.com>
References: <[🔎] BANLkTikrH1aKsJd+m5GAk+5Ai9RWudixcg@mail.gmail.com> <[🔎] 4DF12173.2060407@gmail.com> <[🔎] 20110610051557.GD4261@mykerinos.kheops.frmug.org> <[🔎] BANLkTinu8XkvvSMDa23qqp4yjZmMP7gOFQ@mail.gmail.com>

Quoting Martijn van Oosterhout (kleptog@gmail.com):
> On 10 June 2011 07:15, Christian PERRIER <bubulle@debian.org> wrote:
> > This pertains to the DDTP, ie what's lying below the DDTSS....
> >
> > This is something that is increadibly missing to the DDTP: the concept
> > of "fuzzy" translation of paragraphs. That was one of the motivations
> > for trying to push something based on gettext.
> 
> Well, this is something the DDTSS could do better once it has direct
> access to the DB, it can see the older descriptions. However, I've got
> no idea what "fuzziness" really means. I can however think of some
> metrics though, like:
> 
> - If the previous description differs from this by less than 5
> characters, it's probably a match. The problem is that you won't pick
> up (e.g.) version changes then.
> - Whitespace only changes should be trivial, should this be an automatic accept?
> 
> Or is it just a case of, like you suggest below, that you have the
> option of selecting near matches, to say I want to see all packages
> where the changes are <10 characters, with the thought that you could
> translate these really quickly.

fuzzy-matching is a tricky topic. gettext utilities have a nice
algorithm for this. I don't exactly know how it works but it gives
great results on sufficiently long strings. Even a full paragraph
where a full sentence is added will be picked by fuzzy matching and
then marked "fuzzy" in a PO file.

The problem we have in the DDT* is that there is AFAIK no way to mark
a paragraph as "translated but should be corrected", ie "fuzzy".

> > This logic could be in the DDTSS, probably. However, it means
> > "remembering" things. What could be done in some way could be giving a
> > higher priority to packages that have a few paragraphs already
> > translated (as the DDTP is paragraph-based, translating a paragraph
> > that's common to several packages automatically "populates" all
> > packages that share this paragraph.
> 
> This should be doable. When a description is loaded, load for fuzzy
> matches and if it's found plug the translation in with the prefix
> <fuzzy>.
> 
> I haven't been able to find any info on how fuzzy matching in gettext
> works, if it's just Levenstein distance, then it should be easy to
> implement.

No idea what Levenstein distance is but I trust you to find the right
solution..:)

Attachment: signature.asc
Description: Digital signature

Reply to:

Follow-Ups:
- Re: What about DDTSS do you (dis)like?
  - From: Martijn van Oosterhout <kleptog@gmail.com>

References:
- What about DDTSS do you (dis)like?
  - From: Martijn van Oosterhout <kleptog@gmail.com>
- Re: What about DDTSS do you (dis)like?
  - From: Davide Prina <davide.prina@gmail.com>
- Re: What about DDTSS do you (dis)like?
  - From: Christian PERRIER <bubulle@debian.org>
- Re: What about DDTSS do you (dis)like?
  - From: Martijn van Oosterhout <kleptog@gmail.com>

Prev by Date: Re: What about DDTSS do you (dis)like?
Next by Date: Re: What about DDTSS do you (dis)like?
Previous by thread: Re: What about DDTSS do you (dis)like?
Next by thread: Re: What about DDTSS do you (dis)like?
Index(es):
- Date
- Thread