[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: [GSoC] Profiling SQL query (Was: [GSoC] Reg Blends Web Sentinel)



Hi Akshita,

On Thu, Mar 12, 2015 at 06:44:03PM +0530, Akshita Jha wrote:
> I was working on optimizing the query - "query_bug_packages". I have a
> small doubt regarding the query:

s/small/severe/ :-)

I confirm that your doubt is perfectly rectified.
 
> In the sub-query:
> 
> SELECT s.source, b.component, s.homepage, s.vcs_browser, s.maintainer,
> s.version, row_number() OVER (PARTITION BY s.source ORDER BY s.version DESC)
>             FROM blends_dependencies b
>             JOIN packages p ON p.package = b.package
>             JOIN bugs bu    ON bu.source = p.source
>             JOIN sources s  ON s.source  = p.source
>             WHERE blend = $1 AND b.distribution = 'debian'
>             GROUP BY s.source, b.dependency, b.component, s.homepage,
> s.vcs_browser, s.maintainer, s.version
>         ) sources -- check status of dependency relation because only
> suggested packages are less important for bugs sentinel
> 
> we are already using PARTITION BY and selecting "row_number=1", then why
> exactly do we need GROUP BY ?

Since the code was never peer reviewed?  I first went with the GROUP BY
which was sufficient for the first test queries.  Later I tried to get
rid of some duplicates via PARTITION and simply forgot to remove the
GROUP BY ...

> I tried indexing, but udd already has very well-defined indexes. I also
> tried changing the join conditions but there was not much improvement. I
> maybe wrong in my analysis but if you could help me understand the query a
> little better, it would be of great help.

I think your hint was of great help!  Droping the GROUP BY increased the
speed by an order of magnitude.  I also droped the other similarly
redundant GROUP BY (see my last commit).  This has caused a drastical
performance kick. if you compare the old result:


$ cat logs/*bugs_udd.out
...
Time: 281.581650019
Query: EXECUTE query_bug_packages('debian-games')
Rendering bugs from UDD for Blend 'debian-games' took 299 seconds
Time: 34.5106248856
Query: EXECUTE query_bug_packages('debian-gis')
Rendering bugs from UDD for Blend 'debian-gis' took 37 seconds
...
Time: 868.058537006
Query: EXECUTE query_bug_packages('debian-junior')
Rendering bugs from UDD for Blend 'debian-junior' took 892 seconds
...
Rendering bugs from UDD for Blend 'debian-med' took 27 seconds
Time: 193.018954039
Query: EXECUTE query_bug_packages('debian-multimedia')
Rendering bugs from UDD for Blend 'debian-multimedia' took 198 seconds
...
Time: 183.071342945
Query: EXECUTE query_bug_packages('debian-science')
Rendering bugs from UDD for Blend 'debian-science' took 194 seconds


cat logs/*bugs_udd.out
...
Rendering bugs from UDD for Blend 'debian-games' took 57 seconds
Rendering bugs from UDD for Blend 'debian-gis' took 9 seconds
Time: 70.1499569416
Query: EXECUTE query_bug_packages('debian-junior')
Rendering bugs from UDD for Blend 'debian-junior' took 86 seconds
...
Rendering bugs from UDD for Blend 'debian-med' took 28 seconds
Rendering bugs from UDD for Blend 'debian-multimedia' took 28 seconds
...
Rendering bugs from UDD for Blend 'debian-science' took 31 seconds


So if you compare the time from Debian Junior 892 versus 86 seconds
that's the amount of time which is perfectly acceptable.

Now I was also able to run debian-edu on my box in a sensible time
frame:

Time: 178.415522814
Query: EXECUTE query_bugs('debian-edu')
Time: 1252.45552516
Query: EXECUTE query_bug_packages('debian-edu')
Rendering bugs from UDD for Blend 'debian-edu' took 1437 seconds

I think this would be also acceptable to run on production UDD.
However, I wonder whether you could do some further review after this
issue should be clarified.  Perhaps we get another factor 2 or 4 for
speeding up to make sure we will not uselessly stress test Debian
hardware.

In summary: Thanks a lot for your help to
  a) Get bugs_udd working at all
  b) Speed it up to some acceptable state.
That's really welcome since my time capacity is to hard occupied
by other things and I have obviously not given enough love to these
tools.

Another hint: In the dir misc/sql/ in the Git repository I have stored
some test queries.  The script bugs.sh (or
bugs_including_metapackage_bugs.sh - I do not remember properly) was
used as template to test the queries quickly.  If you want to use these
feel free to change these scripts - they have no other purpose than
testing.  Make sure you check whether they are in sync with bugs_udd
first!

Kind regards

        Andreas.

-- 
http://fam-tille.de


Reply to: