Re: reducing random_page_cost from 4 to 2 to force index scan

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Greg Smith <greg(at)2ndquadrant(dot)com>
Cc: Jim Nasby <jim(at)nasby(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Jesper Krogh <jesper(at)krogh(dot)cc>, pgsql-performance(at)postgresql(dot)org
Subject: Re: reducing random_page_cost from 4 to 2 to force index scan
Date: 2011-05-19 14:53:21
Message-ID: BANLkTiktoq=GZjBgPNakwS8_DNK2rmYi2Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Wed, May 18, 2011 at 11:00 PM, Greg Smith <greg(at)2ndquadrant(dot)com> wrote:
> Jim Nasby wrote:
>> I think the challenge there would be how to define the scope of the
>> hot-spot. Is it the last X pages? Last X serial values? Something like
>> correlation?
>>
>> Hmm... it would be interesting if we had average relation access times for
>> each stats bucket on a per-column basis; that would give the planner a
>> better idea of how much IO overhead there would be for a given WHERE clause
>
> You've already given one reasonable first answer to your question here.  If
> you defined a usage counter for each histogram bucket, and incremented that
> each time something from it was touched, that could lead to a very rough way
> to determine access distribution.  Compute a ratio of the counts in those
> buckets, then have an estimate of the total cached percentage; multiplying
> the two will give you an idea how much of that specific bucket might be in
> memory.  It's not perfect, and you need to incorporate some sort of aging
> method to it (probably weighted average based), but the basic idea could
> work.

Maybe I'm missing something here, but it seems like that would be
nightmarishly slow. Every time you read a tuple, you'd have to look
at every column of the tuple and determine which histogram bucket it
was in (or, presumably, which MCV it is, since those aren't included
in working out the histogram buckets). That seems like it would slow
down a sequential scan by at least 10x.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Jim Nasby 2011-05-19 18:39:58 Re: reducing random_page_cost from 4 to 2 to force index scan
Previous Message Greg Smith 2011-05-19 03:00:31 Re: reducing random_page_cost from 4 to 2 to force index scan