Re: Selecting max(pk) is slow on empty set

From: "Alexander Staubo" <alex(at)purefiction(dot)net>
To: "Richard Huxton" <dev(at)archonet(dot)com>
Cc: "Postgresql General" <pgsql-general(at)postgresql(dot)org>
Subject: Re: Selecting max(pk) is slow on empty set
Date: 2008-01-22 14:24:27
Message-ID: 88daf38c0801220624g6fa3ca36p636e1dfafa915116@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On 1/22/08, Richard Huxton <dev(at)archonet(dot)com> wrote:
> Although the row-estimate still seems quite high. You might want to
> increase it even further (maximum is 1000). If this is a common query,
> I'd look at an index on (user,id) rather than just (user) perhaps.

Actually that index (with the same statistics setting as before)
yields slightly worse performance:

# explain analyze select max(id) from user_messages where user_id = 13604;

QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------
Result (cost=3.86..3.87 rows=1 width=0) (actual time=0.051..0.052
rows=1 loops=1)
InitPlan
-> Limit (cost=0.00..3.86 rows=1 width=4) (actual
time=0.045..0.045 rows=0 loops=1)
-> Index Scan Backward using user_messages_user_id_id on
user_messages (cost=0.00..1486.79 rows=385 width=4) (actual
time=0.042..0.042 rows=0 loops=1)
Index Cond: (user_id = 13604)
Filter: (id IS NOT NULL)
Total runtime: 0.128 ms

Compare with the plain index on the one attribute:

# explain analyze select max(id) from user_messages where user_id = 13604;
QUERY
PLAN
---------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=1388.34..1388.35 rows=1 width=4) (actual
time=0.034..0.035 rows=1 loops=1)
-> Index Scan using user_messages_user on user_messages
(cost=0.00..1387.40 rows=374 width=4) (actual time=0.030..0.030 rows=0
loops=1)
Index Cond: (user_id = 13604)
Total runtime: 0.085 ms

> > But it's worrying that PostgreSQL should be so off in planning the
> > query. Does this behaviour qualify as a bug, or is this -- that is,
> > the need to tweak statistics parameters -- just your garden-variety
> > application-specific optimization?
>
> Well, it's data-specific rather than application specific I suppose. The
> issue is that there is a cost to tracking 100 values and you don't
> want to pay that on every column in every table. If user 13604 isn't in
> the list of most-common users then all it can really do is fix an upper
> bound on how many matches it can have. Of course you and I can reason
> outside of the data and guess that manu users won't have more than a
> handful of messages, but that's not something PG can do.

Absolutely. Thanks for the pointers.

Alexander.

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Richard Huxton 2008-01-22 14:39:24 Re: Selecting max(pk) is slow on empty set
Previous Message Richard Huxton 2008-01-22 14:14:51 Re: Selecting max(pk) is slow on empty set