[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Bug#1020387: dictionaries-common: Consensus regarding the packaging of the Qt WebEngine hunspell binary dictionaries



El jue, 22 sept 2022 a las 21:30, Soren Stoutner
(<soren@stoutner.com>) escribió:
>
> On Thursday, September 22, 2022 9:20:46 AM MST Agustin Martin wrote:
>
> > First of all, I am curious about the reasons behind this new format,
> > the problems it deals with and its advantages. I assume they are valid
> > enough, but they imply yet another spellchecking engine/format. We
> > currently have goog old ispell, aspell and hunspell. vim has its own
> > spellchecker engine using its own format, with dicts that can be
> > created from old myspell2 dicts. We did not add vim format dicts (from
> > aspell dicts sources) since there seems to be some work to make vim
> > use hunspell directly. And now these bdict dicts.
>
> The .bdic format is specified by the upstream Chromium project, and is required by anything that is based off of Chromium's code, like Qt WebEngine.  I do not know why they went with a proprietary binary format, but I would assume that if they went to so much trouble to not use the standard Hunspell format there must have been something to make it worthwhile, like some performance improvement.  Perhaps I am giving Google too much credit for having logical reasons instead of making arbitrary decisions.

Hi, Soren

It s a pity not to have more info about the reasons for this new
format. Even if using it is more effficient in terms of plain
performance, I do not think that is noticeable in stuff like chromium.

Another question is whether that bdic format is expected to change or
that is very unlikely.

Thinking about this, I have done some tests about these bdic files
being generated at postinst, like emacs byte-compiled files (although
my tests were  more rude), delegating everything to the qtwebengine
packages. . bdic generation is not very slow, but IMHO is not fast
enough to go this way (which woud require moving
qwebengine_convert_dic to Qt WebEngine runtime package and control
everything from it).

One noticeable thing is that bdic generation  failed for some hunspell
dicts I have installed

++ processing an_ES.aff
[1003/125813.760330:FATAL:aff_reader.cc(305)] Did not find a space in 'y    i'.
Trace/breakpoint trap
++ processing ar.aff
[1003/125813.796753:FATAL:aff_reader.cc(123)] We don't support the
IGNORE command yet. This would change how we would insert things in
our lookup table.
++ processing gl_ES.aff
gl_ES.dic_delta not found.
Reading gl_ES.aff
Reading gl_ES.dic
Serializing...
Verifying...
Word does not match!
  Index:    2126
  Expected: Abū po:antropónimo
is:ngrama_Abū_ʿAbdullāh_Muḥammad_ibn_Jābir_ibn_Sinān_ar_Raqqī_al_Ḥarrani_aṣ_Ṣabiʾ_al_Battānī
  Actual:   Abū po:antropónimo
is:ngrama_Abū_ʿAbdullāh_Muḥammad_ibn_Jābir_ibn_Sinān_ar_Raqqī_al_Ḥarrani_aṣ_Ṣabiʾ_al_Battā
ERROR converting, the dictionary does not check out OK.

Regards,

-- 
Agustin


Reply to: