From: | "Alexander Staubo" <alex(at)purefiction(dot)net> |
---|---|
To: | "Richard Huxton" <dev(at)archonet(dot)com> |
Cc: | "Postgresql General" <pgsql-general(at)postgresql(dot)org> |
Subject: | Re: Selecting max(pk) is slow on empty set |
Date: | 2008-01-22 14:24:27 |
Message-ID: | 88daf38c0801220624g6fa3ca36p636e1dfafa915116@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
On 1/22/08, Richard Huxton <dev(at)archonet(dot)com> wrote:
> Although the row-estimate still seems quite high. You might want to
> increase it even further (maximum is 1000). If this is a common query,
> I'd look at an index on (user,id) rather than just (user) perhaps.
Actually that index (with the same statistics setting as before)
yields slightly worse performance:
# explain analyze select max(id) from user_messages where user_id = 13604;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------
Result (cost=3.86..3.87 rows=1 width=0) (actual time=0.051..0.052
rows=1 loops=1)
InitPlan
-> Limit (cost=0.00..3.86 rows=1 width=4) (actual
time=0.045..0.045 rows=0 loops=1)
-> Index Scan Backward using user_messages_user_id_id on
user_messages (cost=0.00..1486.79 rows=385 width=4) (actual
time=0.042..0.042 rows=0 loops=1)
Index Cond: (user_id = 13604)
Filter: (id IS NOT NULL)
Total runtime: 0.128 ms
Compare with the plain index on the one attribute:
# explain analyze select max(id) from user_messages where user_id = 13604;
QUERY
PLAN
---------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=1388.34..1388.35 rows=1 width=4) (actual
time=0.034..0.035 rows=1 loops=1)
-> Index Scan using user_messages_user on user_messages
(cost=0.00..1387.40 rows=374 width=4) (actual time=0.030..0.030 rows=0
loops=1)
Index Cond: (user_id = 13604)
Total runtime: 0.085 ms
> > But it's worrying that PostgreSQL should be so off in planning the
> > query. Does this behaviour qualify as a bug, or is this -- that is,
> > the need to tweak statistics parameters -- just your garden-variety
> > application-specific optimization?
>
> Well, it's data-specific rather than application specific I suppose. The
> issue is that there is a cost to tracking 100 values and you don't
> want to pay that on every column in every table. If user 13604 isn't in
> the list of most-common users then all it can really do is fix an upper
> bound on how many matches it can have. Of course you and I can reason
> outside of the data and guess that manu users won't have more than a
> handful of messages, but that's not something PG can do.
Absolutely. Thanks for the pointers.
Alexander.
From | Date | Subject | |
---|---|---|---|
Next Message | Richard Huxton | 2008-01-22 14:39:24 | Re: Selecting max(pk) is slow on empty set |
Previous Message | Richard Huxton | 2008-01-22 14:14:51 | Re: Selecting max(pk) is slow on empty set |