Quick Links

Re: index-only scans

From:	Cédric Villemain <cedric(dot)villemain(dot)debian(at)gmail(dot)com>
To:	Anssi Kääriäinen <anssi(dot)kaariainen(at)thl(dot)fi>
Cc:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, "postgres(at)cybertec(dot)at" <postgres(at)cybertec(dot)at>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: index-only scans
Date:	2011-09-23 14:34:34
Message-ID:	CAF6yO=06JCc_-z8_Kh418-Xr3Hs9XU7g3hEw7tjjmHrA5m54-w@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

2011/8/16 Anssi Kääriäinen <anssi(dot)kaariainen(at)thl(dot)fi>:
> On 08/14/2011 12:31 AM, Heikki Linnakangas wrote:
>>>
>>> The same idea could of course be used to calculate the effective cache
>>> hit ratio for each table. Cache hit ratio would have the problem of feedback
>>> loops, though.
>>
>> Yeah, I'm not excited about making the planner and statistics more
>> dynamic. Feedback loops and plan instability are not fun.
>
> I might be a little out of my league here... But I was thinking about the
> cache hit ratio and feedback loops. I understand automatic tuning would be
> hard. But making automatic tuning easier (by using pg_tune for example)
> would be a big plus for most use cases.
>
> To make it easier to tune the page read costs automatically, it would be
> nice if there would be four variables instead of the current two:
> - random_page_cost is the cost of reading a random page from storage.
> Currently it is not, it is the cost of accessing a random page, taking in
> account it might be in memory.
> - seq_page_cost is the cost of reading pages sequentially from storage
> - memory_page_cost is the cost of reading a page in memory
> - cache_hit_ratio is the expected cache hit ratio
>
> memory_page_cost would be server global, random and seq page costs
> tablespace specific, and cache_hit_ratio relation specific. You would get
> the current behavior by tuning *_page_costs realistically, and setting
> cache_hit_ratio globally so that the expected random_page_cost /
> seq_page_cost stays the same as now.
>
> The biggest advantage of this would be that the correct values are much
> easier to detect automatically compared to current situation. This can be
> done using pg_statio_* views and IO speed testing. They should not be tuned
> automatically by PostgreSQL, at least not the cache_hit_ratio, as that leads
> to the possibility of feedback loops and plan instability. The variables
> would also be much easier to understand.
>
> There is the question if one should be allowed to tune the *_page_costs at
> all. If I am not missing something, it is possible to detect the correct
> values programmatically and they do not change if you do not change the
> hardware. Cache hit ratio is the real reason why they are currently so
> important for tuning.
>
> An example why the current random_page_cost and seq_page_cost tuning is not
> adequate is that you can only set random_page_cost per tablespace. That
> makes perfect sense if random_page_cost would be the cost of accessing a
> page in storage. But it is not, it is a combination of that and caching
> effects, so that it actually varies per relation (and over time). How do you
> set it correctly for a query where one relation is fully cached and another
> one not?
>
> Another problem is that if you use random_page_cost == seq_page_cost, you
> are effectively saying that everything is in cache. But if everything is in
> cache, the cost of page access relative to cpu_*_costs is way off. The more
> random_page_cost and seq_page_cost are different, the more they mean the
> storage access costs. When they are the same, they mean the memory page
> cost. There can be an order of magnitude in difference of a storage page
> cost and a memory page cost. So it is hard to tune the cpu_*_costs
> realistically for cases where sometimes data is in cache and sometimes not.
>
> Ok, enough hand waving for one post :) Sorry if this all is obvious /
> discussed before. My googling didn't turn out anything directly related,
> although these have some similarity:
> - Per-table random_page_cost for tables that we know are always cached
> [http://archives.postgresql.org/pgsql-hackers/2008-04/msg01503.php]
> - Script to compute random page cost
> [http://archives.postgresql.org/pgsql-hackers/2002-09/msg00503.php]
> - The science of optimization in practical terms?
> [http://archives.postgresql.org/pgsql-hackers/2009-02/msg00718.php] getting
> really interesting starting from here:
> [http://archives.postgresql.org/pgsql-hackers/2009-02/msg00787.php]

late reply.
You can add this link to your list:
http://archives.postgresql.org/pgsql-hackers/2011-06/msg01140.php

--
Cédric Villemain +33 (0)6 20 30 22 52
http://2ndQuadrant.fr/
PostgreSQL: Support 24x7 - Développement, Expertise et Formation

In response to

Re: index-only scans at 2011-08-16 10:24:25 from Anssi Kääriäinen

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Thom Brown	2011-09-23 14:37:12	Re: [pgsql-advocacy] Unlogged vs. In-Memory
Previous Message	Magnus Hagander	2011-09-23 14:31:37	Re: Re: [BUGS] BUG #6189: libpq: sslmode=require verifies server certificate if root.crt is present