Re: Column correlation drifts, index ignored again

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: Kevin Brown <kevin(at)sysexperts(dot)com>, Postgres Performance <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Column correlation drifts, index ignored again
Date: 2004-02-24 18:29:46
Message-ID: 23533.1077647386@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Josh Berkus <josh(at)agliodbs(dot)com> writes:
> Kevin,
>> 1. set enable_seqscan = on
>> 2. set random_page_cost = <some really high value to force seqscans>
>> 3. EXPLAIN ANALYZE query
>> 4. record the ratio of estimated to actual scan times.
>> 5. set enable_seqscan = off
>> 6. set random_page_cost = <rough estimate of what it should be>
>> 7. EXPLAIN ANALYZE query
>> 8. record the actual index scan time(s)
>> 9. tweak random_page_cost
>> 10. EXPLAIN query
>> 11. If ratio of estimate to actual (recorded in step 8) is much
>> different than that recorded in step 4, then go back to step 9.
>> Reduce random_page_cost if the random ratio is larger than the
>> sequential ratio, increase if it's smaller.

> Nice, we ought to post that somewhere people can find it in the future.

If we post it as recommended procedure we had better put big caveat
notices on it. The pitfalls with doing this are:

1. If you repeat the sequence exactly as given, you will be homing in on
a RANDOM_PAGE_COST that describes your system's behavior with a fully
cached query. It is to be expected that you will end up with 1.0 or
something very close to it. The only way to avoid that is to use a
query that is large enough to blow out your kernel's RAM cache; which of
course will take long enough that iterating step 10 will be no fun,
and people will be mighty tempted to take shortcuts.

2. Of course, you are computing a RANDOM_PAGE_COST that is relevant to
just this single query. Prudence would suggest repeating the process
with several different queries and taking some sort of average.

When I did the experiments that led up to choosing 4.0 as the default,
some years ago, it took several days of thrashing the disks on a couple
of different machines before I had numbers that I didn't think were
mostly noise :-(. I am *real* suspicious of any replacement numbers
that have been derived in just a few minutes.

regards, tom lane

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Ed L. 2004-02-24 18:36:08 Re: [PERFORMANCE] slow small delete on large table
Previous Message Hans-Jürgen Schönig 2004-02-24 17:40:25 Re: [HACKERS] [SQL] Materialized View Summary