[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: fstrcmp



On Tue, Jun 02, 2009 at 02:32:06PM +0200, Jérôme Pouiller <jezz@sysmic.org> was heard to say:
> In another thread, Adeodato Simó wrote:
> > I can't see how it'd work here, at least without the help of some
> > on-disk structure, since we're talking about a space of 25,000
> > packages.
> 
> Naive search of matching string under a set of 25,000 strings is 
> something like 2000 times slower (maybe far more) than a correct 
> algorithm. Adeodato thinks it is not usable. I said we shouldn't reject 
> idea because of performances and I suggested a better algorithm should 
> be used.

  aptitude already scans every package regularly.  For instance, when
you ask it to install something that doesn't exist, it checks every
package name using a naive substring algorithm.  According to "time",
this takes 1.5 to 1.7 seconds on my computer, most of which appears
to be spent initializing the program.

  Fuzzy matches are more expensive, of course.  It might even be
interesting to just do an edit distance computation up to k=1 and see
how that performed, though; I bet it would be reasonable with some mild
optimization.

  Daniel


Reply to: