Re: Selecting max(pk) is slow on empty set

From: "Alexander Staubo" <alex(at)purefiction(dot)net>
To: "Richard Huxton" <dev(at)archonet(dot)com>
Cc: "Postgresql General" <pgsql-general(at)postgresql(dot)org>
Subject: Re: Selecting max(pk) is slow on empty set
Date: 2008-01-22 13:16:33
Message-ID: 88daf38c0801220516o7e4d828djda659033824a8aca@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On 1/22/08, Richard Huxton <dev(at)archonet(dot)com> wrote:
> Hmm, but with an estimated cost of 3646 (vs.633 for the max(*) which
> uses the wrong index). That explains why it's walking backwards through
> the pkey index, it thinks that it's 8 times cheaper.
[...]
> Have a look at most_common_vals,most_common_freqs in pg_stats for
> tbl=user_messages, att=user perhaps.

# select histogram_bounds from pg_stats where
tablename='user_messages' and attname='user_id';
histogram_bounds
-------------------------------------------------------
{1,489,1097,1824,2555,3452,4488,5679,6879,8637,13448}

# select null_frac, n_distinct, most_common_vals, most_common_freqs
from pg_stats where tablename='user_messages' and attname='user_id';
null_frac | n_distinct | most_common_vals
| most_common_freqs
-----------+------------+--------------------------------------------------+-------------------------------------------------------------------------------------------------------
0 | 2652 |
{5826,484,1206,780,823,4085,4157,5852,1962,6453} |
{0.00933333,0.00766667,0.00666667,0.00633333,0.006,0.00566667,0.00566667,0.00533333,0.005,0.00466667}

> Then see if an ALTER TABLE SET
> STATISTICS 100 makes a difference.

So it does:

# explain analyze select max(id) from user_messages where user_id = 13604;

QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=1320.52..1320.53 rows=1 width=4) (actual
time=13.640..13.640 rows=1 loops=1)
-> Index Scan using user_messages_user on user_messages
(cost=0.00..1319.62 rows=358 width=4) (actual time=13.631..13.631
rows=0 loops=1)
Index Cond: (user_id = 13604)
Total runtime: 13.712 ms

Thank you! That solves my performance problem, at least.

But it's worrying that PostgreSQL should be so off in planning the
query. Does this behaviour qualify as a bug, or is this -- that is,
the need to tweak statistics parameters -- just your garden-variety
application-specific optimization?

Alexander.

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Rhys Stewart 2008-01-22 13:22:42 Re: (un)grouping question
Previous Message Josh Harrison 2008-01-22 13:14:18 Re: Online Oracle to Postgresql data migration