[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: UDD: statement_timeout for guest user set to 120s



On 24/10/11 at 16:49 +0200, Lucas Nussbaum wrote:
> Hi,
> 
> Martin Zobel-Helas just set statement_timeout to 120s on UDD for the
> guest user.

Hi,

One of the problem that showed up due to this change is that a script
used by the PTS to extract package descriptions (so that they can be
displayed in a tooltip in the "binaries" box) is timeouting.

query+query plan+duration on idle samosa:
explain analyze select distinct packages.package, packages.description
from packages,
(select package, max(version) from packages group by package)
     as latest (package, version)
 where packages.package = latest.package
   and packages.version = latest.version;
                                                                QUERY PLAN                                                                 
-------------------------------------------------------------------------------------------------------------------------------------------
 HashAggregate  (cost=520775.93..521726.29 rows=95036 width=57) (actual time=44597.052..44642.126 rows=43791 loops=1)
   ->  Hash Join  (cost=483415.16..518400.05 rows=475176 width=57) (actual time=42003.884..44025.568 rows=444577 loops=1)
         Hash Cond: (public.packages.package = public.packages.package)
         Join Filter: (public.packages.version = (max(public.packages.version)))
         ->  HashAggregate  (cost=232807.27..233105.23 rows=23837 width=24) (actual time=6388.986..6456.052 rows=43787 loops=1)
               ->  Seq Scan on packages  (cost=0.00..228055.51 rows=950351 width=24) (actual time=180.207..4342.799 rows=696049 loops=1)
         ->  Hash  (cost=228055.51..228055.51 rows=950351 width=67) (actual time=35613.946..35613.946 rows=696049 loops=1)
               Buckets: 65536  Batches: 4  Memory Usage: 15298kB
               ->  Seq Scan on packages  (cost=0.00..228055.51 rows=950351 width=67) (actual time=4829.071..33890.330 rows=696049 loops=1)
 Total runtime: 44699.135 ms
(10 rows)

Clearly the problem here is the Seq scans on packages, that require reading the
data from disk since the table usually doesn't stay in RAM. (the packages table
takes 2 GB). Apparently, when the machine is idle, the runtime is reasonable (I
could get a runtime of 14s). But if the query runs at the same time as an
import of data, it can take more than 200s.

I tried to add an INDEX:
CREATE INDEX packages_pkgverdescr_idx on packages(package, version,description);

But it is not being used:
                                                               QUERY PLAN                                                                
-----------------------------------------------------------------------------------------------------------------------------------------
 HashAggregate  (cost=499026.31..499722.83 rows=69652 width=58) (actual time=14708.197..14751.840 rows=43791 loops=1)
   ->  Hash Join  (cost=471046.39..497285.02 rows=348258 width=58) (actual time=12295.665..14135.913 rows=444577 loops=1)
         Hash Cond: (public.packages.package = public.packages.package)
         Join Filter: (public.packages.version = (max(public.packages.version)))
         ->  HashAggregate  (cost=228999.76..229306.82 rows=24565 width=24) (actual time=6314.553..6382.598 rows=43787 loops=1)
               ->  Seq Scan on packages  (cost=0.00..225517.17 rows=696517 width=24) (actual time=163.114..4332.862 rows=696049 loops=1)
         ->  Hash  (cost=225517.17..225517.17 rows=696517 width=68) (actual time=5980.429..5980.429 rows=696049 loops=1)
               Buckets: 65536  Batches: 2  Memory Usage: 30568kB
               ->  Seq Scan on packages  (cost=0.00..225517.17 rows=696517 width=68) (actual time=167.867..4632.245 rows=696049 loops=1)
 Total runtime: 14776.545 ms
(10 rows)

I also tried to rewrite the query into:
explain analyze select distinct packages.package, packages.description
from packages where (package, version) in (select package, max(version) from packages_summary group by package);

(packages_summary is the table with a restricted set of field from packages)

But it didn't improve the situation:
                                                                  QUERY PLAN                                                                   
-----------------------------------------------------------------------------------------------------------------------------------------------
 HashAggregate  (cost=568287.40..568983.45 rows=69605 width=58) (actual time=22807.126..22850.689 rows=43791 loops=1)
   ->  Merge Semi Join  (cost=551756.24..567417.34 rows=174012 width=58) (actual time=19079.588..22188.066 rows=444577 loops=1)
         Merge Cond: (public.packages.package = public.packages.package)
         Join Filter: (public.packages.version = (max(public.packages.version)))
         ->  Sort  (cost=320419.47..322159.59 rows=696049 width=68) (actual time=12487.122..13751.439 rows=696049 loops=1)
               Sort Key: public.packages.package
               Sort Method:  external merge  Disk: 53608kB
               ->  Seq Scan on packages  (cost=0.00..225512.49 rows=696049 width=68) (actual time=165.411..4722.740 rows=696049 loops=1)
         ->  Sort  (cost=231336.77..231398.18 rows=24565 width=64) (actual time=6592.390..6730.105 rows=282886 loops=1)
               Sort Key: public.packages.package
               Sort Method:  quicksort  Memory: 3857kB
               ->  HashAggregate  (cost=228992.73..229299.80 rows=24565 width=24) (actual time=6351.003..6410.176 rows=43787 loops=1)
                     ->  Seq Scan on packages  (cost=0.00..225512.49 rows=696049 width=24) (actual time=165.005..4360.815 rows=696049 loops=1)
 Total runtime: 22902.911 ms

Any ideas?

Lucas


Reply to: