From: | Richard Huxton <dev(at)archonet(dot)com> |
---|---|
To: | Alexander Staubo <alex(at)purefiction(dot)net> |
Cc: | Postgresql General <pgsql-general(at)postgresql(dot)org> |
Subject: | Re: Selecting max(pk) is slow on empty set |
Date: | 2008-01-22 14:14:51 |
Message-ID: | 4795FA5B.8030003@archonet.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Alexander Staubo wrote:
> On 1/22/08, Richard Huxton <dev(at)archonet(dot)com> wrote:
>> Then see if an ALTER TABLE SET
>> STATISTICS 100 makes a difference.
>
> So it does:
>
> # explain analyze select max(id) from user_messages where user_id = 13604;
>
> QUERY PLAN
> -----------------------------------------------------------------------------------------------------------------------------------------------
> Aggregate (cost=1320.52..1320.53 rows=1 width=4) (actual
> time=13.640..13.640 rows=1 loops=1)
> -> Index Scan using user_messages_user on user_messages
> (cost=0.00..1319.62 rows=358 width=4) (actual time=13.631..13.631
> rows=0 loops=1)
> Index Cond: (user_id = 13604)
> Total runtime: 13.712 ms
>
> Thank you! That solves my performance problem, at least.
Although the row-estimate still seems quite high. You might want to
increase it even further (maximum is 1000). If this is a common query,
I'd look at an index on (user,id) rather than just (user) perhaps.
> But it's worrying that PostgreSQL should be so off in planning the
> query. Does this behaviour qualify as a bug, or is this -- that is,
> the need to tweak statistics parameters -- just your garden-variety
> application-specific optimization?
Well, it's data-specific rather than application specific I suppose. The
issue is that there is a cost to tracking 100 values and you don't
want to pay that on every column in every table. If user 13604 isn't in
the list of most-common users then all it can really do is fix an upper
bound on how many matches it can have. Of course you and I can reason
outside of the data and guess that manu users won't have more than a
handful of messages, but that's not something PG can do.
In theory, PG could auto-tune itself for various parameters. The problem
then is, do you:
1. Learn constantly, meaning you constantly pay the cost of checking
your decisions and never get consistent plans.
2. Learn once, in which case a change in data frequencies or usage
patterns renders your learning out of date.
You might find http://pgfoundry.org/ useful with the fouine / pqa
projects to analyse query logs.
--
Richard Huxton
Archonet Ltd
From | Date | Subject | |
---|---|---|---|
Next Message | Alexander Staubo | 2008-01-22 14:24:27 | Re: Selecting max(pk) is slow on empty set |
Previous Message | Rhys Stewart | 2008-01-22 13:22:42 | Re: (un)grouping question |