Re: Improve WALRead() to suck data directly from WAL buffers when possible

From: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To: bharath(dot)rupireddyforpostgres(at)gmail(dot)com
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Improve WALRead() to suck data directly from WAL buffers when possible
Date: 2022-12-12 02:57:17
Message-ID: 20221212.115717.1183698540772933396.horikyota.ntt@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

At Fri, 9 Dec 2022 14:33:39 +0530, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com> wrote in
> The patch introduces concurrent readers for the WAL buffers, so far
> only there are concurrent writers. In the patch, WALRead() takes just
> one lock (WALBufMappingLock) in shared mode to enable concurrent
> readers and does minimal things - checks if the requested WAL page is
> present in WAL buffers, if so, copies the page and releases the lock.
> I think taking just WALBufMappingLock is enough here as the concurrent
> writers depend on it to initialize and replace a page in WAL buffers.
>
> I'll add this to the next commitfest.
>
> Thoughts?

This adds copying of the whole page (at least) at every WAL *record*
read, fighting all WAL writers by taking WALBufMappingLock on a very
busy page while the copying. I'm a bit doubtful that it results in an
overall improvement. It seems to me almost all pread()s here happens
on file buffer so it is unclear to me that copying a whole WAL page
(then copying the target record again) wins over a pread() call that
copies only the record to read. Do you have an actual number of how
frequent WAL reads go to disk, or the actual number of performance
gain or real I/O reduction this patch offers?

This patch copies the bleeding edge WAL page without recording the
(next) insertion point nor checking whether all in-progress insertion
behind the target LSN have finished. Thus the copied page may have
holes. That being said, the sequential-reading nature and the fact
that WAL buffers are zero-initialized may make it work for recovery,
but I don't think this also works for replication.

I remember that the one of the advantage of reading the on-memory WAL
records is that that allows walsender to presend the unwritten
records. So perhaps we should manage how far the buffer is filled with
valid content (or how far we can presend) in this feature.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kyotaro Horiguchi 2022-12-12 03:06:36 Re: Improve WALRead() to suck data directly from WAL buffers when possible
Previous Message Thomas Munro 2022-12-12 02:45:51 Re: Tree-walker callbacks vs -Wdeprecated-non-prototype