Quoting Martijn van Oosterhout (kleptog@gmail.com): > On 10 June 2011 07:15, Christian PERRIER <bubulle@debian.org> wrote: > > This pertains to the DDTP, ie what's lying below the DDTSS.... > > > > This is something that is increadibly missing to the DDTP: the concept > > of "fuzzy" translation of paragraphs. That was one of the motivations > > for trying to push something based on gettext. > > Well, this is something the DDTSS could do better once it has direct > access to the DB, it can see the older descriptions. However, I've got > no idea what "fuzziness" really means. I can however think of some > metrics though, like: > > - If the previous description differs from this by less than 5 > characters, it's probably a match. The problem is that you won't pick > up (e.g.) version changes then. > - Whitespace only changes should be trivial, should this be an automatic accept? > > Or is it just a case of, like you suggest below, that you have the > option of selecting near matches, to say I want to see all packages > where the changes are <10 characters, with the thought that you could > translate these really quickly. fuzzy-matching is a tricky topic. gettext utilities have a nice algorithm for this. I don't exactly know how it works but it gives great results on sufficiently long strings. Even a full paragraph where a full sentence is added will be picked by fuzzy matching and then marked "fuzzy" in a PO file. The problem we have in the DDT* is that there is AFAIK no way to mark a paragraph as "translated but should be corrected", ie "fuzzy". > > This logic could be in the DDTSS, probably. However, it means > > "remembering" things. What could be done in some way could be giving a > > higher priority to packages that have a few paragraphs already > > translated (as the DDTP is paragraph-based, translating a paragraph > > that's common to several packages automatically "populates" all > > packages that share this paragraph. > > This should be doable. When a description is loaded, load for fuzzy > matches and if it's found plug the translation in with the prefix > <fuzzy>. > > I haven't been able to find any info on how fuzzy matching in gettext > works, if it's just Levenstein distance, then it should be easy to > implement. No idea what Levenstein distance is but I trust you to find the right solution..:)
Attachment:
signature.asc
Description: Digital signature