Re: WAL prefetch

From: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Sean Chittenden <seanc(at)joyent(dot)com>
Subject: Re: WAL prefetch
Date: 2018-06-19 12:33:37
Message-ID: 7d50a243-eb78-6a65-905c-4ddf425df16e@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 19.06.2018 14:03, Tomas Vondra wrote:
>
>
> On 06/19/2018 11:08 AM, Konstantin Knizhnik wrote:
>>
>>
>> On 18.06.2018 23:47, Andres Freund wrote:
>>> On 2018-06-18 16:44:09 -0400, Robert Haas wrote:
>>>> On Sat, Jun 16, 2018 at 3:41 PM, Andres Freund <andres(at)anarazel(dot)de>
>>>> wrote:
>>>>>> The posix_fadvise approach is not perfect, no doubt about that.
>>>>>> But it
>>>>>> works pretty well for bitmap heap scans, and it's about 13249x
>>>>>> better
>>>>>> (rough estimate) than the current solution (no prefetching).
>>>>> Sure, but investing in an architecture we know might not live long
>>>>> also
>>>>> has it's cost. Especially if it's not that complicated to do better.
>>>> My guesses are:
>>>>
>>>> - Using OS prefetching is a very small patch.
>>>> - Prefetching into shared buffers is a much bigger patch.
>>> Why?\302\240 The majority of the work is standing up a bgworker that does
>>> prefetching (i.e. reads WAL, figures out reads not in s_b, does
>>> prefetch). Allowing a configurable number + some synchronization
>>> between
>>> them isn't that much more work.
>>
>> I do not think that prefetching in shared buffers requires much more
>> efforts and make patch more envasive...
>> It even somehow simplify it, because there is no to maintain own
>> cache of prefetched pages...
>> But it will definitely have much more impact on Postgres performance:
>> contention for buffer locks, throwing away pages accessed by
>> read-only queries,...
>>
>> Also there are two points which makes prefetching into shared buffers
>> more complex:
>> 1. Need to spawn multiple workers to make prefetch in parallel and
>> somehow distribute work between them.
>> 2. Synchronize work of recovery process with prefetch to prevent
>> prefetch to go too far and doing useless job.
>> The same problem exists for prefetch in OS cache, but here risk of
>> false prefetch is less critical.
>>
>
> I think the main challenge here is that all buffer reads are currently
> synchronous (correct me if I'm wrong), while the posix_fadvise()
> allows a to prefetch the buffers asynchronously.

Yes, this is why we have to spawn several concurrent background workers
to perfrom prefetch.
>
> I don't think simply spawning a couple of bgworkers to prefetch
> buffers is going to be equal to async prefetch, unless we support some
> sort of async I/O. Maybe something has changed recently, but every
> time I looked for good portable async I/O API/library I got burned.
>
> Now, maybe a couple of bgworkers prefetching buffers synchronously
> would be good enough for WAL refetching - after all, we only need to
> prefetch data fast enough for the recovery not to wait. But I doubt
> it's going to be good enough for bitmap heap scans, for example.
>
> We need a prefetch that allows filling the I/O queues with hundreds of
> requests, and I don't think sync prefetch from a handful of bgworkers
> can achieve that.
>
> regards
>

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Etsuro Fujita 2018-06-19 12:46:17 Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.
Previous Message Pavan Deolasee 2018-06-19 11:25:46 Re: MERGE SQL statement for PG12