From: | KONDO Mitsumasa <kondo(dot)mitsumasa(at)lab(dot)ntt(dot)co(dot)jp> |
---|---|
To: | Claudio Freire <klaussfreire(at)gmail(dot)com> |
Cc: | PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Optimize kernel readahead using buffer access strategy |
Date: | 2013-11-18 02:02:59 |
Message-ID: | 52897553.6000006@lab.ntt.co.jp |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
(2013/11/15 13:48), Claudio Freire wrote:
> On Thu, Nov 14, 2013 at 11:13 PM, KONDO Mitsumasa
>> I use CentOS 6.4 which kernel version is 2.6.32-358.23.2.el6.x86_64 in this
>> test.
>
> That's close to the kernel version I was using, so you should see the
> same effect.
OK. You proposed readahead maximum patch, I think it seems to get benefit for
perofomance and your part of argument is really true.
>> Your patch becomes maximum readahead, when a sql is selected index range
>> scan. Is it right?
>
> Ehm... sorta.
>
>> I think that your patch assumes that pages are ordered by
>> index-data.
>
> No. It just knows which pages will be needed, and fadvises them. No
> guessing involved, except the guess that the scan will not be aborted.
> There's a heuristic to stop limited scans from attempting to fadvise,
> and that's that prefetch strategy is applied only from the Nth+ page
> walk.
We may completely optimize kernel readahead in PostgreSQL in the future,
however it is very difficult and takes long time that it completely comes true
from a beginning. So I propose GUC switch that can use in their transactions.(I
will create this patch in this CF.). If someone off readahed for using file cache
more efficient in his transactions, he can set "SET readahead = off". PostgreSQL
is open source, and I think that it becomes clear which case it is effective for,
by using many people.
> It improves index-only scans the most, but I also attempted to handle
> heap prefetches. That's where the kernel started conspiring against
> me, because I used many naturally-clustered indexes, and THERE
> performance was adversely affected because of that kernel bug.
I also create gaussinan-distributed pgbench now and submit this CF. It can clear
which situasion is effective, partially we will know.
>>> You may want to try your patch with more
>>> real workloads, and maybe you'll confirm what I found out last time I
>>> messed with posix_fadvise. If my experience is still relevant, those
>>> patterns will have suffered a severe performance penalty with this
>>> patch, because it will disable kernel read-ahead on sequential index
>>> access. It may still work for sequential heap scans, because the
>>> access strategy will tell the kernel to do read-ahead, but many other
>>> access methods will suffer.
>>
>> The decisive difference with your patch is that my patch uses buffer hint
>> control architecture, so it can control readahaed smarter in some cases.
>
> Indeed, but it's not enough. See my above comment about naturally
> clustered indexes. The planner expects that, and plans accordingly. It
> will notice correlation between a PK and physical location, and will
> treat an index scan over PK to be almost sequential. With your patch,
> that assumption will be broken I believe.
~
>> However, my patch is on the way and needed to more improvement. I am going
>> to add method of controlling readahead by GUC, for user can freely select
>> readahed parameter in their transactions.
>
> Rather, I'd try to avoid fadvising consecutive or almost-consecutive
> blocks. Detecting that is hard at the block level, but maybe you can
> tie that detection into the planner, and specify a sequential strategy
> when the planner expects index-heap correlation?
I think we had better to develop these patches in step by step each patches,
because it is difficult that readahead optimizetion is completely come true from
a beginning of one patch. We need flame-work in these patches, first.
>>> Try OLAP-style queries.
>>
>> I have DBT-3(TPC-H) benchmark tools. If you don't like TPC-H, could you tell
>> me good OLAP benchmark tools?
>
> I don't really know. Skimming the specs, I'm not sure if those queries
> generate large index range queries. You could try, maybe with
> autoexplain?
OK, I do. And, I will use simple large index range queries with explain command.
Regards,
--
Mitsuamsa KONDO
NTT Open Source Software Center
From | Date | Subject | |
---|---|---|---|
Next Message | Claudio Freire | 2013-11-18 02:25:22 | Re: Optimize kernel readahead using buffer access strategy |
Previous Message | wangshuo | 2013-11-18 02:02:12 | Parse more than bind and execute when connect to database by jdbc |