Re: Selecting max(pk) is slow on empty set

From: Richard Huxton <dev(at)archonet(dot)com>
To: Alexander Staubo <alex(at)purefiction(dot)net>
Cc: Postgresql General <pgsql-general(at)postgresql(dot)org>
Subject: Re: Selecting max(pk) is slow on empty set
Date: 2008-01-22 14:14:51
Message-ID: 4795FA5B.8030003@archonet.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Alexander Staubo wrote:
> On 1/22/08, Richard Huxton <dev(at)archonet(dot)com> wrote:
>> Then see if an ALTER TABLE SET
>> STATISTICS 100 makes a difference.
>
> So it does:
>
> # explain analyze select max(id) from user_messages where user_id = 13604;
>
> QUERY PLAN
> -----------------------------------------------------------------------------------------------------------------------------------------------
> Aggregate (cost=1320.52..1320.53 rows=1 width=4) (actual
> time=13.640..13.640 rows=1 loops=1)
> -> Index Scan using user_messages_user on user_messages
> (cost=0.00..1319.62 rows=358 width=4) (actual time=13.631..13.631
> rows=0 loops=1)
> Index Cond: (user_id = 13604)
> Total runtime: 13.712 ms
>
> Thank you! That solves my performance problem, at least.

Although the row-estimate still seems quite high. You might want to
increase it even further (maximum is 1000). If this is a common query,
I'd look at an index on (user,id) rather than just (user) perhaps.

> But it's worrying that PostgreSQL should be so off in planning the
> query. Does this behaviour qualify as a bug, or is this -- that is,
> the need to tweak statistics parameters -- just your garden-variety
> application-specific optimization?

Well, it's data-specific rather than application specific I suppose. The
issue is that there is a cost to tracking 100 values and you don't
want to pay that on every column in every table. If user 13604 isn't in
the list of most-common users then all it can really do is fix an upper
bound on how many matches it can have. Of course you and I can reason
outside of the data and guess that manu users won't have more than a
handful of messages, but that's not something PG can do.

In theory, PG could auto-tune itself for various parameters. The problem
then is, do you:
1. Learn constantly, meaning you constantly pay the cost of checking
your decisions and never get consistent plans.
2. Learn once, in which case a change in data frequencies or usage
patterns renders your learning out of date.

You might find http://pgfoundry.org/ useful with the fouine / pqa
projects to analyse query logs.

--
Richard Huxton
Archonet Ltd

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Alexander Staubo 2008-01-22 14:24:27 Re: Selecting max(pk) is slow on empty set
Previous Message Rhys Stewart 2008-01-22 13:22:42 Re: (un)grouping question