| From: | Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> | 
|---|---|
| To: | Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>, Andres Freund <andres(at)anarazel(dot)de>, Robert Haas <robertmhaas(at)gmail(dot)com> | 
| Cc: | Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Sean Chittenden <seanc(at)joyent(dot)com> | 
| Subject: | Re: WAL prefetch | 
| Date: | 2018-06-19 11:03:27 | 
| Message-ID: | b303de54-86c2-7dee-19b1-938aa2ce5028@2ndquadrant.com | 
| Views: | Whole Thread | Raw Message | Download mbox | Resend email | 
| Thread: | |
| Lists: | pgsql-hackers | 
On 06/19/2018 11:08 AM, Konstantin Knizhnik wrote:
> 
> 
> On 18.06.2018 23:47, Andres Freund wrote:
>> On 2018-06-18 16:44:09 -0400, Robert Haas wrote:
>>> On Sat, Jun 16, 2018 at 3:41 PM, Andres Freund <andres(at)anarazel(dot)de> 
>>> wrote:
>>>>> The posix_fadvise approach is not perfect, no doubt about that. But it
>>>>> works pretty well for bitmap heap scans, and it's about 13249x better
>>>>> (rough estimate) than the current solution (no prefetching).
>>>> Sure, but investing in an architecture we know might not live long also
>>>> has it's cost. Especially if it's not that complicated to do better.
>>> My guesses are:
>>>
>>> - Using OS prefetching is a very small patch.
>>> - Prefetching into shared buffers is a much bigger patch.
>> Why?\302\240 The majority of the work is standing up a bgworker that does
>> prefetching (i.e. reads WAL, figures out reads not in s_b, does
>> prefetch). Allowing a configurable number + some synchronization between
>> them isn't that much more work.
> 
> I do not think that prefetching in shared buffers requires much more 
> efforts and make patch more envasive...
> It even somehow simplify it, because there is no to maintain own cache 
> of prefetched pages...
> But it will definitely have much more impact on Postgres performance: 
> contention for buffer locks, throwing away pages accessed by read-only 
> queries,...
> 
> Also there are two points which makes prefetching into shared buffers 
> more complex:
> 1. Need to spawn multiple workers to make prefetch in parallel and 
> somehow distribute work between them.
> 2. Synchronize work of recovery process with prefetch to prevent 
> prefetch to go too far and doing useless job.
> The same problem exists for prefetch in OS cache, but here risk of false 
> prefetch is less critical.
> 
I think the main challenge here is that all buffer reads are currently 
synchronous (correct me if I'm wrong), while the posix_fadvise() allows 
a to prefetch the buffers asynchronously.
I don't think simply spawning a couple of bgworkers to prefetch buffers 
is going to be equal to async prefetch, unless we support some sort of 
async I/O. Maybe something has changed recently, but every time I looked 
for good portable async I/O API/library I got burned.
Now, maybe a couple of bgworkers prefetching buffers synchronously would 
be good enough for WAL refetching - after all, we only need to prefetch 
data fast enough for the recovery not to wait. But I doubt it's going to 
be good enough for bitmap heap scans, for example.
We need a prefetch that allows filling the I/O queues with hundreds of 
requests, and I don't think sync prefetch from a handful of bgworkers 
can achieve that.
regards
-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Amit Kapila | 2018-06-19 11:03:40 | Re: Making all nbtree entries unique by having heap TIDs participate in comparisons | 
| Previous Message | Ashutosh Bapat | 2018-06-19 10:56:56 | Re: Partitioning with temp tables is broken |