From: | Cédric Villemain <cedric(dot)villemain(dot)debian(at)gmail(dot)com> |
---|---|
To: | Josh Berkus <josh(at)agliodbs(dot)com> |
Cc: | pgsql-performance(at)postgresql(dot)org |
Subject: | Re: reducing random_page_cost from 4 to 2 to force index scan |
Date: | 2011-05-13 20:51:08 |
Message-ID: | BANLkTimpyaL5UAcu_BcuMBS0+AgTSQJr+Q@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
2011/5/13 Josh Berkus <josh(at)agliodbs(dot)com>:
>
>> I guess maybe the reason why it didn't matter for the OP is that - if
>> the size of the index page in pages is smaller than the pro-rated
>> fraction of effective_cache_size allowed to the index - then the exact
>> value doesn't affect the answer.
>>
>> I apparently need to study this code more.
>
> FWIW: random_page_cost is meant to be the ratio between the cost of
> looking up a single row as and index lookup, and the cost of looking up
> that same row as part of a larger sequential scan. For specific
> storage, that coefficient should be roughly the same regardless of the
> table size. So if your plan for optimization involves manipulating RPC
> for anything other than a change of storage, you're Doing It Wrong.
>
> Instead, we should be fixing the formulas these are based on and leaving
> RPC alone.
>
> For any data page, there are actually four costs associated with each
> tuple lookup, per:
>
> in-memory/seq | on disk/seq
> ----------------+----------------
> in-memory/random| on disk/random
it lacks some more theorical like sort_page/temp_page : those are
based on a ratio of seq_page_cost and random_page_cost or a simple
seq_page_cost (when working out of work_mem)
memory access is accounted with some 0.1 in some place AFAIR.
(and memory random/seq is the same at the level of estimations we do)
>
> (yes, there's actually more for bitmapscan etc. but the example holds)
(if I read correctly the sources, for this one there is a linear
approach to ponderate the cost between random_page cost and
seq_page_cost on the heap page fetch plus the Mackert and Lohman
formula, if needed, in its best usage : predicting what should be in
cache *because* of the current query execution, not because of the
current status of the page cache)
>
> For any given tuple lookup, then, you can assign a cost based on where
> you think that tuple falls in that quadrant map. Since this is all
> probability-based, you'd be assigning a cost as a mixed % of in-memory
> and on-disk costs. Improvements in accuracy of this formula would come
> through improvements in accuracy in predicting if a particular data page
> will be in memory.
>
> This is what the combination of random_page_cost and
> effective_cache_size ought to supply, but I don't think it does, quite.
>
> --
> Josh Berkus
> PostgreSQL Experts Inc.
> http://pgexperts.com
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>
--
Cédric Villemain 2ndQuadrant
http://2ndQuadrant.fr/ PostgreSQL : Expertise, Formation et Support
From | Date | Subject | |
---|---|---|---|
Next Message | John Rouillard | 2011-05-13 21:09:41 | Using pgiosim realistically |
Previous Message | Josh Berkus | 2011-05-13 20:13:41 | Re: reducing random_page_cost from 4 to 2 to force index scan |