[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: What about DDTSS do you (dis)like?

Quoting Martijn van Oosterhout (kleptog@gmail.com):
> On 10 June 2011 07:15, Christian PERRIER <bubulle@debian.org> wrote:
> > This pertains to the DDTP, ie what's lying below the DDTSS....
> >
> > This is something that is increadibly missing to the DDTP: the concept
> > of "fuzzy" translation of paragraphs. That was one of the motivations
> > for trying to push something based on gettext.
> Well, this is something the DDTSS could do better once it has direct
> access to the DB, it can see the older descriptions. However, I've got
> no idea what "fuzziness" really means. I can however think of some
> metrics though, like:
> - If the previous description differs from this by less than 5
> characters, it's probably a match. The problem is that you won't pick
> up (e.g.) version changes then.
> - Whitespace only changes should be trivial, should this be an automatic accept?
> Or is it just a case of, like you suggest below, that you have the
> option of selecting near matches, to say I want to see all packages
> where the changes are <10 characters, with the thought that you could
> translate these really quickly.

fuzzy-matching is a tricky topic. gettext utilities have a nice
algorithm for this. I don't exactly know how it works but it gives
great results on sufficiently long strings. Even a full paragraph
where a full sentence is added will be picked by fuzzy matching and
then marked "fuzzy" in a PO file.

The problem we have in the DDT* is that there is AFAIK no way to mark
a paragraph as "translated but should be corrected", ie "fuzzy".

> > This logic could be in the DDTSS, probably. However, it means
> > "remembering" things. What could be done in some way could be giving a
> > higher priority to packages that have a few paragraphs already
> > translated (as the DDTP is paragraph-based, translating a paragraph
> > that's common to several packages automatically "populates" all
> > packages that share this paragraph.
> This should be doable. When a description is loaded, load for fuzzy
> matches and if it's found plug the translation in with the prefix
> <fuzzy>.
> I haven't been able to find any info on how fuzzy matching in gettext
> works, if it's just Levenstein distance, then it should be easy to
> implement.

No idea what Levenstein distance is but I trust you to find the right

Attachment: signature.asc
Description: Digital signature

Reply to: