From: | Richard Huxton <dev(at)archonet(dot)com> |
---|---|
To: | Andrew Schmidt <andrew(at)lifescale(dot)com> |
Cc: | pgsql-general(at)postgresql(dot)org |
Subject: | Re: Two slightly different queries produce same results, |
Date: | 2005-11-11 09:05:37 |
Message-ID: | 43745EE1.1010208@archonet.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Andrew Schmidt wrote:
> However, I've run into a problem where one query took about twice as
> long as innodb. Some investigation and playing around with the query, I
> ended up with a slightly different query but was about 3 times as fast
> as innodb (and 5 times faster than the original query). I didn't add
> any indices and the EXPLAIN ANALYZE doesn't show any new indices being
> used with the new query. The order of which tables are read first was
> changed in the query planner and also it uses HashAggregate vs
> GroupAggregate.. not entirely sure what that means.
It's a different method of aggregating data (calculating your sum()).
The HashAggregate puts values into buckets and works best when you have
a fairly small number of different values (so each value can get its own
bucket). I think - bear in mind I'm just another user, so I'm not
familiar with the internals.
The different plans are because PG thinks it will get different numbers
of rows...
> SELECT tp.team_id, pl.position = 'G' AS is_goalie, SUM(goals) AS
> total_goals
> FROM
> team t JOIN team_players tp ON tp.team_id = t.team_id
> JOIN player_updates pu ON pu.player_id = tp.player_id
> JOIN player pl ON pl.player_id = pu.player_id
> WHERE t.pool_id = 21699
> GROUP BY tp.team_id,is_goalie;
> Explain analyze of this query:
> -> Index Scan using team_pool_id on team t
> (cost=0.00..397.90 rows=391 width=4) (actual time=0.061..2.129 rows=313
> loops=1)
> Index Cond: (pool_id = 21699)
With this index scan you have one condition and PG expects 391 matching
rows (it gets 313, so that's a pretty good estimate).
> SELECT tp.team_id,pl.position = 'G' AS is_goalie,SUM(goals) AS total_goals
> FROM
> team t JOIN team_players tp ON tp.team_id = t.team_id AND t.pool_id =
> 21699 <--- ** added pool_id
> JOIN player_updates pu ON pu.player_id = tp.player_id
> JOIN player pl ON pl.player_id = pu.player_id
> WHERE t.pool_id = 21699
> GROUP BY tp.team_id,is_goalie;
>
> The explain analyze:
> -> Index Scan using team_pool_id on
> team t (cost=0.00..10.44 rows=5 width=4) (actual time=0.049..1.556
> rows=313 loops=1)
> Index Cond: ((pool_id = 21699) AND
> (pool_id = 21699))
Here it sees two conditions, and PG is expecting that will restrict
things further. It's clearly not expecting the same condition twice, but
is presumably treating it something like "a>-4 AND a<4". This actually
makes the row estimate wrong.
So - if PG is getting this bit wrong why is query 2 faster?
Looking at the row estimates for case 1 we can see that the final cases
are badly wrong.
> GroupAggregate (cost=8742.52..9853.85 rows=44453 width=13) (actual
> time=1186.973..1432.548 rows=626 loops=1)
Here it's got the number of rows badly wrong - if you actually ended up
with 44,000 rows then maybe this would be the best plan. This is the
last step though, so what happens before?
> -> Sort (cost=8742.52..8853.66 rows=44453 width=13) (actual
> time=1186.237..1309.562 rows=40928 loops=1)
> Sort Key: tp.team_id, (pl."position" = 'G'::bpchar)
We're about right here. The problem must be in the last step, and
looking at the row estimates it seems to think that no summarising is
actually going to take place.
I think the "position = 'G'" bit is confusing it. Well, we can test that:
1. Just group by "pl.position" - remove the test. Do the estimated and
actual rows match now?
2. Rewrite the position='G' as a CASE structure
3. Do the position='G' part in a sub-query and join to that.
The results of those should show whether this is actually the problem.
--
Richard Huxton
Archonet Ltd
From | Date | Subject | |
---|---|---|---|
Next Message | Andrus | 2005-11-11 09:31:37 | Re: Best way to use indexes for partial match at |
Previous Message | Guido Neitzer | 2005-11-11 08:33:53 | Re: Ordering and unicode |