[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: [PATCH 3/3] debListParser::DescriptionLanguage: huge speedup



On Sat, Jan 25, 2014 at 03:27:45AM +0100, Jann Horn wrote:
> ---
>  apt-pkg/deb/deblistparser.cc |   12 ++++++++++--
>  1 file changed, 10 insertions(+), 2 deletions(-)
> 
> diff --git a/apt-pkg/deb/deblistparser.cc b/apt-pkg/deb/deblistparser.cc
> index 68d544e..878cdd1 100644
> --- a/apt-pkg/deb/deblistparser.cc
> +++ b/apt-pkg/deb/deblistparser.cc
> @@ -199,11 +199,19 @@ string debListParser::DescriptionLanguage()
>     if (Section.FindS("Description").empty() == false)
>        return "";
>  
> +   const char *start;
> +   const char *end;
>     std::vector<string> const lang = APT::Configuration::getLanguages(true);
>     for (std::vector<string>::const_iterator l = lang.begin();
>  	l != lang.end(); ++l)
> -      if (Section.FindS(string("Description-").append(*l).c_str()).empty() == false)
> -	 return *l;
> +   {
> +      string element = *l;
> +      char search_string[12+element.size()+1];
> +      memcpy(search_string, "Description-", 12);
> +      memcpy(search_string+12, element.c_str(), element.size()+1);
> +      if (Section.Find(search_string, start, end) == true && start != end)
> +	 return element;
> +   }
>  
>     return "";
>  }
> -- 
> 1.7.10.4


Oh, I remember writing this method now and thinking that I should try to
optimize it after the general concept is workig properly… *lalala*
Multi-Language support was my first bigger project in APT ~5 years ago:
time flies, but bad code stays forever it seems ;)



Maybe we should go a step further here through: A file will most likely
either have a "Description:" or a "Description-$LANG" field.
Most of them will have a "Description-md5:" buddy around.
Very few will have no Description field at all (or only the md5 buddy).
Files with multiple "Description{,-$LANG}" fields do not exist in
practice.

So instead of looping over the languages vector we should better search
for all "Description*" fields and take the one listed first here in the
vector. Should avoid some useless passes over the record:
Currently for me this method will search for "Description-de_DE:" first
which exist for two packages and "de" hasn't 100% coverage either –
especially in the Translation-en file… I don't even want to think about
the poor souls who do not have Translations for their language…
(like tlh_AST ;) )


(In fact, the code should list all Description fields and their
 languages together, so that multiple fields in a file could be
 supported, but that wasn't something I had thought of at that time
 and now requires abi breaks and is probably not worth the effort)


btw: your string building with memcpy looks really C-stylish, does this
really make any sort of difference? I would presume the improvemnt here
is Find vs. FindS only, right?


Best regards

David Kalnischkies

Attachment: signature.asc
Description: Digital signature


Reply to: