Re: SeqScan costs

From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Gregory Stark <stark(at)enterprisedb(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: SeqScan costs
Date: 2008-08-13 04:30:37
Message-ID: 1218601837.5343.204.camel@ebony.2ndQuadrant
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


On Tue, 2008-08-12 at 23:58 +0100, Gregory Stark wrote:
> "Simon Riggs" <simon(at)2ndquadrant(dot)com> writes:
>
> > On Tue, 2008-08-12 at 15:46 -0400, Tom Lane wrote:
> >> Simon Riggs <simon(at)2ndquadrant(dot)com> writes:
> >> > Proposal: Make the first block of a seq scan cost random_page_cost, then
> >> > after that every additional block costs seq_page_cost.
> >>
> >> This is only going to matter for a table of 1 block (or at least very
> >> few blocks), and for such a table it's highly likely that it's in RAM
> >> anyway. So I'm unconvinced that the proposed change represents a
> >> better model of reality.
>
> I think the first block of a sequential scan is clearly a random access.

Agreed

> If
> that doesn't represent reality well then perhaps we need to tackle both
> problems together.

It does represent reality.

> Somehow we need to discount scan i/o cost based on how much
> of the table we expect to be in cache. For 1-block tables if we should expect
> them to be in cache we should be zeroing out all the i/o cost whether random
> or sequential.

This isn't relevant.

> > The access cost should be the same for a 1 block table, whether its on
> > disk or in memory.
>
> Uhm, huh? That can't be what you meant to write?

It can be. The *comparative* access cost (in the optimizer) between
random and sequential access should be exactly the same for a 1 block
table whether the block is on disk or in memory. In reality the block
access time is identical for a 1 block table at each level of the
storage hierarchy

* in shared_buffers (ReadBuffer)
* in filesystem cache (read() to cache)
* on disk (read() to disk)

> > AFAICS the cost cross-over is much higher than the actual elapsed time
> > cross-over for both narrow and wide tables.
> >
> > Thats why using SET enable_seqscan=off helps performance in many cases,
> > or why people reduce random_page_cost to force index selection.
>
> People lower random_page_cost because we're not doing a good job estimating
> how much of a table is in cache. I think that would be a great target for some
> careful analysis. If you can come up with specific places and reasonable
> heuristics to discount i/o costs based on effective_cache_size and then
> demonstrate cases where it produces consistently better cost estimates that
> would be a huge help.
>
> I've been running benchmarks where I see accurate random_page_costs of 13-80
> on uncached data on a moderate sized raid array. But of course when a some of
> the data is cached the effective random_page_cost is much much lower than
> that.

Agreed, but thats a harder problem and nothing I wish to raise here.

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2008-08-13 04:38:10 Re: Transaction-controlled robustness for replication
Previous Message Tom Lane 2008-08-13 04:20:55 Re: Replay attack of query cancel