[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: [GSoC] Profiling SQL query (Was: [GSoC] Reg Blends Web Sentinel)



Hi,

I was working on optimizing the query - "query_bug_packages". I have a small doubt regarding the query:

In the sub-query:

SELECT s.source, b.component, s.homepage, s.vcs_browser, s.maintainer, s.version, row_number() OVER (PARTITION BY s.source ORDER BY s.version DESC)
            FROM blends_dependencies b
            JOIN packages p ON p.package = b.package
            JOIN bugs bu    ON bu.source = p.source
            JOIN sources s  ON s.source  = p.source
            WHERE blend = $1 AND b.distribution = 'debian'
            GROUP BY s.source, b.dependency, b.component, s.homepage, s.vcs_browser, s.maintainer, s.version
        ) sources -- check status of dependency relation because only suggested packages are less important for bugs sentinel

we are already using PARTITION BY and selecting "row_number=1", then why exactly do we need GROUP BY ? I ran "EXPLAIN ANALYZE" on the query and it shows that sorting according to GROUP_BY consumes alot of time as postgresql incorrectly estimates the no. of rows it can get. Below is the relevant snippet of the "EXPLAIN ANALYZE select ...":

Group Key: s.source, b.dependency, b.component, s.homepage, s.vcs_browser, s.maintainer, s.version
    ->  Sort  (cost=1562098.14..1573387.76 rows=4515849 width=182) (actual time=9957627.487..11431481.052 rows=32929272 loops=1)
            Sort Key: s.source, b.dependency, b.component, s.homepage, s.vcs_browser, s.maintainer, s.version
            Sort Method: external merge  Disk: 5797768kB
            ->  Merge Join  (cost=179177.90..260321.57 rows=4515849 width=182) (actual time=8609.923..146112.242 rows=32929272 loops=1)
                Merge Cond: (bu.source = p.source)
                ->  Index Only Scan using bugs_source_idx on bugs bu  (cost=0.42..13573.81 rows=88898 width=9) (actual time=0.030..194.105 rows=88653 loops=1)
                Heap Fetches: 88653
                ->  Materialize  (cost=179177.48..183443.32 rows=220577 width=192) (actual time=8609.819..47071.992 rows=32944075 loops=1)
                ->  Merge Join  (cost=179177.48..182891.88 rows=220577 width=192) (actual time=8609.813..12251.926 rows=283633 loops=1)
                    Merge Cond: (s.source = p.source)
                    ->  Sort  (cost=27674.48..27877.38 rows=81159 width=175) (actual time=580.788..3621.560 rows=80830 loops=1)
                        Sort Key: s.source
                        Sort Method: external merge  Disk: 12712kB
                        ->  Seq Scan on sources s  (cost=0.00..14119.59 rows=81159 width=175) (actual time=0.191..84.195 rows=81159 loops=1)
               ->  Sort  (cost=151503.00..151647.91 rows=57963 width=17) (actual time=8028.629..8141.948 rows=283630 loops=1)


I tried indexing, but udd already has very well-defined indexes. I also tried changing the join conditions but there was not much improvement. I maybe wrong in my analysis but if you could help me understand the query a little better, it would be of great help.

PFA the ouput of "EXPLAIN ANALYZE" for "query_bug_packages".

Thanking You,
Akshita Jha 


On Wed, Mar 11, 2015 at 4:10 PM, Andreas Tille <andreas@an3as.eu> wrote:
On Wed, Mar 11, 2015 at 01:55:41PM +0530, Akshita Jha wrote:
> Hi,
>
> Sorry for the delay.

No need to sorry.

> > ...
> > If you enjoy some optimising of my initial query which was quite naive
> > without any speed optimisation in mind this would be the final step to
> > replace the original bugs.py (which needed actually replacing *because*
> > of speed considerations since it was doing everything in very small
> > queries to a remote host once you go into production were UDD runs
> > remotely).
> >
>
> Has there been any progress on this ? Should I start working on if its not
> too late ?

Its not to late.  At least I did not spent any time in optimising the
query.  I pointed other GSoC candidates to the discussion but no commit
to Git means for me no solution.

> Thanks alot.

Thanks to you

        Andreas.

--
http://fam-tille.de


--
To UNSUBSCRIBE, email to debian-blends-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: [🔎] 20150311104058.GJ28638@an3as.eu" target="_blank">https://lists.debian.org/[🔎] 20150311104058.GJ28638@an3as.eu




--
Akshita Jha
Explain Analyze output:                                                                                                                                                                                   QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Unique  (cost=2438339.43..2460667.27 rows=1275877 width=190) (actual time=11480841.565..11480841.890 rows=893 loops=1)
   ->  Sort  (cost=2438339.43..2441529.12 rows=1275877 width=190) (actual time=11480841.565..11480841.608 rows=893 loops=1)
         Sort Key: sources.source, tmp.task, (CASE WHEN (((tmp.dependency = 'd'::bpchar) OR (tmp.dependency = 'r'::bpchar)) AND (sources.component = 'main'::text) AND (exp.experimental_flag > 0)) THEN 'depends'::text ELSE 'suggests'::text END), sources.homepage, (CASE WHEN (sources.vcs_browser IS NULL) THEN '#'::text ELSE sources.vcs_browser END), sources.maintainer
         Sort Method: quicksort  Memory: 236kB
         ->  Merge Left Join  (cost=2017319.57..2073451.99 rows=1275877 width=190) (actual time=11480825.403..11480840.867 rows=893 loops=1)
               Merge Cond: (sources.source = exp.source)
               ->  Merge Left Join  (cost=1999538.63..2023746.53 rows=11037 width=186) (actual time=11479432.519..11479437.375 rows=893 loops=1)
                     Merge Cond: (sources.source = tmp.source)
                     ->  Subquery Scan on sources  (cost=1681473.33..1686852.70 rows=828 width=171) (actual time=11465988.190..11465990.119 rows=669 loops=1)
                           Filter: (sources.row_number = 1)
                           Rows Removed by Filter: 1775
                           ->  WindowAgg  (cost=1681473.33..1684783.71 rows=165519 width=182) (actual time=11465963.684..11465965.370 rows=2444 loops=1)
                                 ->  Sort  (cost=1681473.33..1681887.13 rows=165519 width=182) (actual time=11465756.505..11465756.773 rows=2444 loops=1)
                                       Sort Key: s.source, s.version
                                       Sort Method: quicksort  Memory: 663kB
                                       ->  Group  (cost=1562098.14..1652415.12 rows=165519 width=182) (actual time=9957627.514..11465734.616 rows=2444 loops=1)
                                             Group Key: s.source, b.dependency, b.component, s.homepage, s.vcs_browser, s.maintainer, s.version
                                             ->  Sort  (cost=1562098.14..1573387.76 rows=4515849 width=182) (actual time=9957627.487..11431481.052 rows=32929272 loops=1)
                                                   Sort Key: s.source, b.dependency, b.component, s.homepage, s.vcs_browser, s.maintainer, s.version
                                                   Sort Method: external merge  Disk: 5797768kB
                                                   ->  Merge Join  (cost=179177.90..260321.57 rows=4515849 width=182) (actual time=8609.923..146112.242 rows=32929272 loops=1)
                                                         Merge Cond: (bu.source = p.source)
                                                         ->  Index Only Scan using bugs_source_idx on bugs bu  (cost=0.42..13573.81 rows=88898 width=9) (actual time=0.030..194.105 rows=88653 loops=1)
                                                               Heap Fetches: 88653
                                                         ->  Materialize  (cost=179177.48..183443.32 rows=220577 width=192) (actual time=8609.819..47071.992 rows=32944075 loops=1)
                                                               ->  Merge Join  (cost=179177.48..182891.88 rows=220577 width=192) (actual time=8609.813..12251.926 rows=283633 loops=1)
                                                                     Merge Cond: (s.source = p.source)
                                                                     ->  Sort  (cost=27674.48..27877.38 rows=81159 width=175) (actual time=580.788..3621.560 rows=80830 loops=1)
                                                                           Sort Key: s.source
                                                                           Sort Method: external merge  Disk: 12712kB
                                                                           ->  Seq Scan on sources s  (cost=0.00..14119.59 rows=81159 width=175) (actual time=0.191..84.195 rows=81159 loops=1)
                                                                     ->  Sort  (cost=151503.00..151647.91 rows=57963 width=17) (actual time=8028.629..8141.948 rows=283630 loops=1)
                                                                           Sort Key: p.source
                                                                           Sort Method: quicksort  Memory: 3987kB
                                                                           ->  Hash Join  (cost=178.42..146917.30 rows=57963 width=17) (actual time=16.339..7816.764 rows=47766 loops=1)
                                                                                 Hash Cond: (p.package = b.package)
                                                                                 ->  Seq Scan on packages p  (cost=0.00..121369.75 rows=1239475 width=24) (actual time=14.406..7263.532 rows=1239566 loops=1)
                                                                                 ->  Hash  (cost=159.64..159.64 rows=1503 width=18) (actual time=1.896..1.896 rows=1498 loops=1)
                                                                                       Buckets: 1024  Batches: 1  Memory Usage: 78kB
                                                                                       ->  Bitmap Heap Scan on blends_dependencies b  (cost=60.65..159.64 rows=1503 width=18) (actual time=0.401..1.282 rows=1498 loops=1)
                                                                                             Recheck Cond: (blend = 'debian-edu'::text)
                                                                                             Filter: (distribution = 'debian'::text)
                                                                                             Rows Removed by Filter: 101
                                                                                             Heap Blocks: exact=41
                                                                                             ->  Bitmap Index Scan on blends_dependencies_pkey  (cost=0.00..60.27 rows=1599 width=0) (actual time=0.369..0.369 rows=1599 loops=1)
                                                                                                   Index Cond: (blend = 'debian-edu'::text)
                     ->  Materialize  (cost=318065.29..336732.87 rows=2666 width=25) (actual time=13434.208..13435.482 rows=1093 loops=1)
                           ->  Subquery Scan on tmp  (cost=318065.29..336726.21 rows=2666 width=25) (actual time=13434.186..13435.212 rows=1093 loops=1)
                                 Filter: (tmp.row_number = 1)
                                 Rows Removed by Filter: 45
                                 ->  WindowAgg  (cost=318065.29..330061.60 rows=533169 width=29) (actual time=13434.174..13435.031 rows=1138 loops=1)
                                       ->  Sort  (cost=318065.29..319398.22 rows=533169 width=29) (actual time=13434.165..13434.278 rows=1138 loops=1)
                                             Sort Key: p_1.source, b_1.task, bdp.priority
                                             Sort Method: quicksort  Memory: 138kB
                                             ->  Group  (cost=247927.52..254592.14 rows=533169 width=29) (actual time=13385.821..13429.323 rows=1138 loops=1)
                                                   Group Key: p_1.source, b_1.task, bdp.dependency, bdp.priority
                                                   ->  Sort  (cost=247927.52..249260.45 rows=533169 width=29) (actual time=13385.817..13417.367 rows=58004 loops=1)
                                                         Sort Key: p_1.source, b_1.task, bdp.dependency, bdp.priority
                                                         Sort Method: external merge  Disk: 2184kB
                                                         ->  Merge Join  (cost=176446.38..184454.37 rows=533169 width=29) (actual time=12486.560..12524.167 rows=58004 loops=1)
                                                               Merge Cond: (bdp.dependency = b_1.dependency)
                                                               ->  Sort  (cost=146.16..151.38 rows=2090 width=9) (actual time=0.088..0.091 rows=5 loops=1)
                                                                     Sort Key: bdp.dependency
                                                                     Sort Method: quicksort  Memory: 25kB
                                                                     ->  Seq Scan on blends_dependencies_priorities bdp  (cost=0.00..30.90 rows=2090 width=9) (actual time=0.041..0.045 rows=5 loops=1)
                                                               ->  Sort  (cost=176300.23..176427.78 rows=51021 width=22) (actual time=12477.896..12488.461 rows=58004 loops=1)
                                                                     Sort Key: b_1.dependency
                                                                     Sort Method: external sort  Disk: 2112kB
                                                                     ->  Hash Join  (cost=15988.62..172310.69 rows=51021 width=22) (actual time=2149.175..12190.009 rows=58004 loops=1)
                                                                           Hash Cond: ((p_1.source = s_1.source) AND (p_1.release = s_1.release))
                                                                           ->  Hash Join  (cost=175.65..148500.90 rows=61666 width=29) (actual time=380.202..10269.907 rows=47770 loops=1)
                                                                                 Hash Cond: (p_1.package = b_1.package)
                                                                                 ->  Seq Scan on packages p_1  (cost=0.00..121369.75 rows=1239475 width=31) (actual time=349.165..9463.330 rows=1239566 loops=1)
                                                                                 ->  Hash  (cost=155.66..155.66 rows=1599 width=23) (actual time=2.994..2.994 rows=1599 loops=1)
                                                                                       Buckets: 1024  Batches: 1  Memory Usage: 93kB
                                                                                       ->  Bitmap Heap Scan on blends_dependencies b_1  (cost=60.67..155.66 rows=1599 width=23) (actual time=0.988..1.885 rows=1599 loops=1)
                                                                                             Recheck Cond: (blend = 'debian-edu'::text)
                                                                                             Heap Blocks: exact=41
                                                                                             ->  Bitmap Index Scan on blends_dependencies_pkey  (cost=0.00..60.27 rows=1599 width=0) (actual time=0.948..0.948 rows=1599 loops=1)
                                                                                                   Index Cond: (blend = 'debian-edu'::text)
                                                                           ->  Hash  (cost=14119.59..14119.59 rows=81159 width=19) (actual time=1739.343..1739.343 rows=81159 loops=1)
                                                                                 Buckets: 8192  Batches: 2  Memory Usage: 2064kB
                                                                                 ->  Seq Scan on sources s_1  (cost=0.00..14119.59 rows=81159 width=19) (actual time=6.442..1632.060 rows=81159 loops=1)
               ->  Sort  (cost=17780.94..17838.74 rows=23120 width=17) (actual time=1392.848..1395.171 rows=24971 loops=1)
                     Sort Key: exp.source
                     Sort Method: quicksort  Memory: 2289kB
                     ->  Subquery Scan on exp  (cost=15642.70..16105.10 rows=23120 width=17) (actual time=1268.256..1281.538 rows=24839 loops=1)
                           ->  HashAggregate  (cost=15642.70..15873.90 rows=23120 width=17) (actual time=1268.254..1277.493 rows=24839 loops=1)
                                 Group Key: s_2.source
                                 ->  Hash Join  (cost=1.38..15236.91 rows=81159 width=17) (actual time=12.624..1144.854 rows=79312 loops=1)
                                       Hash Cond: (s_2.release = r.release)
                                       ->  Seq Scan on sources s_2  (cost=0.00..14119.59 rows=81159 width=19) (actual time=12.576..1034.229 rows=81159 loops=1)
                                       ->  Hash  (cost=1.17..1.17 rows=17 width=36) (actual time=0.017..0.017 rows=17 loops=1)
                                             Buckets: 1024  Batches: 1  Memory Usage: 1kB
                                             ->  Seq Scan on releases r  (cost=0.00..1.17 rows=17 width=36) (actual time=0.008..0.011 rows=17 loops=1)
 Planning time: 14.264 ms
 Execution time: 11481139.274 ms
(97 rows)


Reply to: