From: | "Scott Carey" <scott(at)richrelevance(dot)com> |
---|---|
To: | pgsql-performance(at)postgresql(dot)org |
Subject: | Query Planner not choosing hash_aggregate appropriately. |
Date: | 2008-06-25 20:42:38 |
Message-ID: | a1ec7d000806251342m654acc06u5cae6d1b76096d70@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
The query optimizer fails to use a hash aggregate most of the time. This is
an inconsistent behavior -- the queries below were happily using
hash_aggregate on a previous pg_restore from the data.
On one particular class of tables this is especially painful. The example
table has 25 million rows, and when aggregating on a column that the
optimizer expects only a few unique values, it chooses a full sort of those
25 million rows before a group aggregate, rather than using a hash aggregate
that would be 2 to 4 orders of magnitude faster and use less memory.
The simple statement of this bug is the following EXPLAIN output and
corresponding output from the statistics tables. The actual query used has
a more complicated GROUP BY and aggregation (and joins, etc), but if it
can't get the most simple version of a sub query correct, of course the
composite will be worse.
The condition will occur for any column used to group by regardless of the
estimated # of unique items on that column. Even one that has only two
unique values in a 25 million row table.
rr=# explain SELECT count(distinct v_guid) as view_count, p_type FROM
p_log.creative_display_logs_012_2008_06_15 GROUP BY
p_type;
QUERY
PLAN
------------------------------------------------------------
--------------------------------------------------
GroupAggregate (cost=5201495.80..5395385.38 rows=7 width=47)
-> Sort (cost=5201495.80..5266125.63 rows=25851932 width=47)
Sort Key: p_type
-> Seq Scan on creative_display_logs_012_2008_06_15
(cost=0.00..1223383.32 rows=25851932 width=47)
rr=# select attname, null_frac, avg_width,n_distinct
,correlation from pg_stats where
tablename='creative_display_logs_012_2008_06_15'
and attname in ('g_id', 'p_type', 'strat', 'datetime', 'ext_s_id', 't_id');
attname | null_frac | avg_width | n_distinct | correlation
----------------+-----------+-----------+------------+--------------
g_id | 0 | 8 | 14 | 0.221548
p_type | 0 | 4 | 7 | 0.350718
datetime | 0 | 8 | 12584 | 0.977156
ext_s_id | 0.001 | 38 | 11444 | -0.000842848
strat | 0 | 13 | 11 | 0.147418
t_id | 0 | 8 | 2 | 0.998711
(5 rows)
I have dumped, dropped, and restored this table twice recently. Both times
followed by a full vacuum analyze. And in both cases the query optimizer
behaves differently. In one case the poor plan only occures when using the
partition table inheritance facade rather than the direct-to-table version
above. In the other case (the current condition), all variants on the query
are bad.
This definitely occurs in general and its reproducibility is affected by
partitioning but not dependent on it as far as I can tell.
The database is tuned with the default optimizer settings for 8.3.3 plus
constraint exclusion for the partition tables enabled. Yes, hash_agg is on
(actually, commented out so the default of on is active, verified in
pg_settings)
The configuration has ample RAM and all the memory tuning parameters are
generous (shared_mem 7g, temp space 200m, sort/agg space 500m -- I've tried
various settings here with no effect on the plan, just the execution of it
w.r.t. disk based sort or mem based sort).
The table definition is the following, if that helps:
Column | Type | Modifiers
--------------------+-----------------------------+-----------
v_guid | character varying(255) |
site_id | bigint |
c_id | bigint |
item_id | bigint |
creative_id | bigint |
camp_id | bigint |
p_type | integer |
datetime | timestamp without time zone |
date | date |
ext_u_id | character varying(50) |
ext_s_id | character varying(50) |
u_guid | character varying(50) |
strat | character varying(50) |
sub_p_type | character varying(32) |
exp_id | bigint |
t_id | bigint |
htmlpi_id | bigint |
p_score | double precision |
Of course DB hints would solve this. So would some sort of tuning parameter
that lets you dial up or down the tendency to do a hash aggregate rather
than a full sort followed by a group aggregate. This is broken rather
severely, especially in combination with partitions (where it is about 3x as
likely to fail to use a hash_aggregate where appropriate in limited
experiments so far -- there are a few thousand partition tables).
All I want is it to stop being brain-dead and deciding to sort large tables
to produce aggregates. In fact, given the rarity in which a sort is
preferred over a hash_agg with large tables, and the tendancy for aggregates
to reduce the count by a factor of 10 or more -- i'd turn off the group
aggregate if possible!
Thanks for any help!
-Scott
From | Date | Subject | |
---|---|---|---|
Next Message | Henrik | 2008-06-25 22:32:03 | Re: Hardware suggestions for high performance 8.3 |
Previous Message | Greg Smith | 2008-06-25 20:05:49 | Re: Hardware vs Software RAID |