[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Performance of tagfile scanning and similar things



  I've added support for tasks to aptitude, but the result is that the
program takes an extremely long time to load.  This is due to the
following code, based loosely on Jason's suggestion last time I asked
about this:

list<string> *get_tasks(const pkgCache::VerIterator &ver)
{
  if(apt_package_records &&
     !ver.end() &&
     !ver.FileList().end())
    {
      list<string> *rval=new list<string>;
      const char *start,*stop;
      pkgTagSection sec;

      // Pull out pointers to the underlying record.
      apt_package_records->Lookup(ver.FileList()).GetRec(start, stop);

      // Parse it as a section.
      sec.Scan(start, stop-start+1);

      string tasks=sec.FindS("Task");

      string::size_type loc=0, firstcomma=0;

      while( (firstcomma=tasks.find(',', loc))!=tasks.npos)
	{
	  rval->push_back(string(tasks, loc, firstcomma-loc));
	  loc=firstcomma+1;
	}

      if(loc!=tasks.size())
	rval->push_back(string(tasks, loc));

      return rval;
    }
  else
    return new list<string>;
}

  This code is called once per package (to build a list of which
packages are in each task)  If I make it always return an empty list,
the performance is reasonable even on low-end machines.  Similarly,
if I return the full list, and discard the result (ie, don't do any of
the processing normally performed on the generated list), performance is
not improved, at least as far as I can see.  Because of this, I believe
the problem to be in the calls to apt code used to extract the Tasks header.
  (it doesn't help that the program loads in about a second on my
hardware no matter what I do)

  There are some obvious things that can be optimized (eg, I suppose it's
possible that returning a list of strings is more expensive than I
thought), but I don't think it explains the slowness of this routine.

  I was wondering if anyone on this list had an explanation.  Is this
something I can work around, or is it inherent in libapt's
routines?  (eg: if the code has to rescan through the whole Packages
file on each iteration, that would be bad...but I thought that Lookup
would avoid that)
  I suppose if I really have to, I could write my own Packages parser,
but that option isn't very attractive to me, even with the tagfile stuff
to build on.

  (a bit of this is based on bug #116750.  On low-end systems, the
  difference in performance is very dramatic, as mentioned in that
  report)

    Thanks,
  Daniel

-- 
/-------------------- Daniel Burrows <dburrows@debian.org> -------------------\
|      You are standing west of a white house.  There is a mailbox here.      |
\-Evil Overlord, Inc: planning your future today. http://www.eviloverlord.com-/



Reply to: