Re: Planner tuning

From: Alban Hertroys <alban(at)magproductions(dot)nl>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Postgres General <pgsql-general(at)postgresql(dot)org>
Subject: Re: Planner tuning
Date: 2007-03-20 10:45:53
Message-ID: 45FFBB61.6090606@magproductions.nl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Tom Lane wrote:
> Alban Hertroys <alban(at)magproductions(dot)nl> writes:
>> It seems pretty obvious that the planner underestimates the cost of
>> nestloops here, is there some way to tweak this?
>
> The real problem is the factor-of-a-thousand underestimate of the size
> of this join:

Good observation, I missed that one. Thanks.

>> -> Nested Loop (cost=0.00..281.74 rows=2 width=14) (actual time=0.068..14.000 rows=1683 loops=1)
>> -> Index Scan using fewo_location_ancestry_full_idx on fewo_location_ancestry ancestor (cost=0.00..49.34 rows=9 width=4) (actual time=0.024..0.172 rows=41 loops=1)
>> Index Cond: ((ancestor_id = 309) AND (ancestor_type_id = 12) AND (child_type_id = 10))
>> -> Index Scan using fewo_property_location_country_location_idx on fewo_property_location property_location (cost=0.00..25.80 rows=2 width=18) (actual time=0.009..0.169 rows=41 loops=41)
>> Index Cond: ((property_location.country_id = 300) AND ("outer".child_id = property_location.location_id))
>> Filter: (property_state_id = 3)
>
> Have you got up-to-date ANALYZE stats for both of these tables?
> Maybe increasing the statistics targets for them would help.

Yes. This is as of this moment a mostly static development database that
has been vacuumed and analyzed quite recently.

> You may be kind of stuck because of the lack of cross-column statistics
> --- I suppose these columns are probably rather highly correlated ---
> but you should at least try pulling the levers you've got.
>
> One thought is that country_id is probably entirely determined by
> location_id, and possibly ancestor_type_id is determined by ancestor_id.

Actually property.location_id refers to cities, which is the deepest
level in the represented data. Country_id is the top level.

Ancestry id, type and child id, type are indeed closely related. I
changed their representation based on your suggestions.

> If so you should be leaving them out of the queries and indexes;
> they're not doing anything for you except fooling the planner about the
> net selectivity of the conditions.

I tried a few things, but it seems I am quite successful at fooling the
planner...

I changed the indices on our ancestry table to not combine id and type
on the same half of the join; which is something we're in fact never
interested in anyway. This seems to have helped some indeed.

I tried removing country_id from the equation, but I haven't had the
patience to wait for the explain analyzes to complete that way - they
take long.
I implemented it this way as an optimization; I decided to join
property_location with both period_type_property and
property_availability_month using (country_id, property_id) as FK.
That quickly narrows down the number of matching records in those
tables, which an index on property_id only somehow didn't accomplish.

The good news is that I get results under 1s without having to
explicitly sort my subquery results.
The bad news is that the estimated row counts are still quite a bit off.
I analyzed the DB just before generating the attached result.

--
Alban Hertroys
alban(at)magproductions(dot)nl

magproductions b.v.

T: ++31(0)534346874
F: ++31(0)534346876
M:
I: www.magproductions.nl
A: Postbus 416
7500 AK Enschede

// Integrate Your World //

Attachment Content-Type Size
results.txt text/plain 3.3 KB

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Lars Haugseth 2007-03-20 14:04:44 Re: passing passords to pgsql/pg_create/pg_dump programmatically
Previous Message Howard Cole 2007-03-20 09:40:46 Re: TSearch2 Problems