Re: vacuum analyze slows sql query

From: John Meinel <john(at)johnmeinel(dot)com>
To: patrick ~ <sidsrr(at)yahoo(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-performance(at)postgresql(dot)org
Subject: Re: vacuum analyze slows sql query
Date: 2004-11-08 19:29:39
Message-ID: 418FC923.2090704@johnmeinel.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

patrick ~ wrote:
[...]
> pkk=# explain analyze execute pkk_01( 241 );
> QUERY PLAN
> -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> Result (cost=2.66..2.67 rows=1 width=0) (actual time=2872.211..2872.213
> rows=1 loops=1)
> InitPlan
> -> Limit (cost=0.00..2.66 rows=1 width=4) (actual
> time=2872.189..2872.189 rows=0 loops=1)
> -> Seq Scan on pkk_purchase p0 (cost=0.00..37225.83 rows=13983
> width=4) (actual time=2872.180..2872.180 rows=0 loops=1)
> Filter: ((offer_id = $1) AND (((expire_time)::timestamp with
> time zone > now()) OR (expire_time IS NULL) OR (pending = true)) AND
> ((cancel_date IS NULL) OR (pending = true)))
> Total runtime: 2872.339 ms
> (6 rows)
>
> Time: 2873.479 ms
>

[...]

> So, is this the ultimate solution to this issue?
>
> --patrick

It's not so much that correlation is < 0.5. It sounds like you're
running into the same issue that I ran into in the past. You have a
column with lots of repeated values, and a few exceptional ones. Notice
this part of the query:
-> Seq Scan on pkk_purchase p0 (cost rows=13983) (actual rows=0)

For a general number, it thinks it might return 14,000 rows, hence the
sequential scan. Before you do ANALYZE, it uses whatever defaults exist,
which are probably closer to reality.

The problem is that you probably have some values for pkk_purchase where
it could return 14,000 rows (possibly much much more). And for those,
seq scan is the best plan. However, for the particular value that you
are testing, there are very few (no) entries in the table.

With a prepared statement (or a function) it has to determine ahead of
time what the best query is without knowing what value you are going to
ask for.

Lets say for a second that you manage to trick it into using index scan,
and then you actually call the function with one of the values that
returns 1,000s of rows. Probably it will take 10-100 times longer than
if it used a seq scan.

So what is the solution? The only one I'm aware of is to turn your
static function into a dynamic one.

So somewhere within the function you build up a SQL query string and
call EXECUTE str. This forces the query planner to be run every time you
call the function. This means that if you call it will a "nice" value,
you will get the fast index scan, and if you call it with a "bad" value,
it will switch back to seq scan.

The downside is you don't get much of a benefit from using as stored
procedure, as it has to run the query planner all the time (as though
you issue the query manually each time.) But it still might be better
for you in the long run.

Example:

instead of

create function test(int) returns int as '
declare
x alias for $1;
int y;
begin
select into y ... from ... where id=x limit ...;
return y;
end
';

use this format

create function test(int) returns int as '
declare
x alias for $1;
int y;
begin
EXECUTE ''select into y ... from ... where id=''
||quote_literal(x)
|| '' limit ...'';
return y;
end;
';

I think that will point you in the right direction.

John
=:->

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Carlos Lopez 2004-11-08 21:28:41 Re: [PERFORM] poor performance in migrated database
Previous Message patrick ~ 2004-11-08 18:57:02 Re: vacuum analyze slows sql query