From: | Thomas Munro <thomas(dot)munro(at)gmail(dot)com> |
---|---|
To: | Alvaro Herrera <alvherre(at)2ndquadrant(dot)com> |
Cc: | Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: WIP: WAL prefetch (another approach) |
Date: | 2020-03-17 06:32:55 |
Message-ID: | CA+hUKGJ06d3h5JeOtAv4h52n0vG1jOPZxqMCn5FySJQUVZA32w@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi Alvaro,
On Sat, Mar 14, 2020 at 10:15 AM Alvaro Herrera
<alvherre(at)2ndquadrant(dot)com> wrote:
> I tried my luck at a quick read of this patchset.
Thanks! Here's a new patch set, and some inline responses to your feedback:
> I didn't manage to go over 0005 though, but I agree with Tomas that
> having this be configurable in terms of bytes of WAL is not very
> user-friendly.
The primary control is now maintenance_io_concurrency, which is
basically what Tomas suggested.
The byte-based control is just a cap to prevent it reading a crazy
distance ahead, that also functions as the on/off switch for the
feature. In this version I've added "max" to the name, to make that
clearer.
> First of all, let me join the crowd chanting that this is badly needed;
> I don't need to repeat what Chittenden's talk showed. "WAL recovery is
> now 10x-20x times faster" would be a good item for pg13 press release,
> I think.
We should be careful about over-promising here: Sean basically had a
best case scenario for this type of techology, partly due to his 16kB
filesystem blocks. Common results may be a lot more pedestrian,
though it could get more interesting if we figure out how to get rid
of FPWs...
> > From a61b4e00c42ace5db1608e02165f89094bf86391 Mon Sep 17 00:00:00 2001
> > From: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
> > Date: Tue, 3 Dec 2019 17:13:40 +1300
> > Subject: [PATCH 1/5] Allow PrefetchBuffer() to be called with a SMgrRelation.
> >
> > Previously a Relation was required, but it's annoying to have
> > to create a "fake" one in recovery.
>
> LGTM.
>
> It's a pity to have to include smgr.h in bufmgr.h. Maybe it'd be sane
> to use a forward struct declaration and "struct SMgrRelation *" instead.
OK, done.
While staring at this, I decided that SharedPrefetchBuffer() was a
weird word order, so I changed it to PrefetchSharedBuffer(). Then, by
analogy, I figured I should also change the pre-existing function
LocalPrefetchBuffer() to PrefetchLocalBuffer(). Do you think this is
an improvement?
> > From acbff1444d0acce71b0218ce083df03992af1581 Mon Sep 17 00:00:00 2001
> > From: Thomas Munro <tmunro(at)postgresql(dot)org>
> > Date: Mon, 9 Dec 2019 17:10:17 +1300
> > Subject: [PATCH 2/5] Rename GetWalRcvWriteRecPtr() to GetWalRcvFlushRecPtr().
> >
> > The new name better reflects the fact that the value it returns
> > is updated only when received data has been flushed to disk.
> >
> > An upcoming patch will make use of the latest data that was
> > written without waiting for it to be flushed, so use more
> > precise function names.
>
> Ugh. (Not for your patch -- I mean for the existing naming convention).
> It would make sense to rename WalRcvData->receivedUpto in this commit,
> maybe to flushedUpto.
Ok, I renamed that variable and a related one. There are more things
you could rename if you pull on that thread some more, including
pg_stat_wal_receiver's received_lsn column, but I didn't do that in
this patch.
> > From d7fa7d82c5f68d0cccf441ce9e8dfa40f64d3e0d Mon Sep 17 00:00:00 2001
> > From: Thomas Munro <tmunro(at)postgresql(dot)org>
> > Date: Mon, 9 Dec 2019 17:22:07 +1300
> > Subject: [PATCH 3/5] Add WalRcvGetWriteRecPtr() (new definition).
> >
> > A later patch will read received WAL to prefetch referenced blocks,
> > without waiting for the data to be flushed to disk. To do that,
> > it needs to be able to see the write pointer advancing in shared
> > memory.
> >
> > The function formerly bearing name was recently renamed to
> > WalRcvGetFlushRecPtr(), which better described what it does.
>
> > + pg_atomic_init_u64(&WalRcv->writtenUpto, 0);
>
> Umm, how come you're using WalRcv here instead of walrcv? I would flag
> this patch for sneaky nastiness if this weren't mostly harmless. (I
> think we should do away with local walrcv pointers altogether. But that
> should be a separate patch, I think.)
OK, done.
> > + pg_atomic_uint64 writtenUpto;
>
> Are we already using uint64s for XLogRecPtrs anywhere? This seems
> novel. Given this, I wonder if the comment near "mutex" needs an
> update ("except where atomics are used"), or perhaps just move the
> member to after the line with mutex.
Moved.
We use [u]int64 in various places in the replication code. Ideally
I'd have a magic way to say atomic<XLogRecPtr> so I didn't have to
assume that pg_atomic_uint64 is the right atomic integer width and
signedness, but here we are. In dsa.h I made a special typedef for
the atomic version of something else, but that's because the size of
that thing varied depending on the build, whereas our LSNs are of a
fixed width that ought to be en... <trails off>.
> I didn't understand the purpose of inc_counter() as written. Why not
> just pg_atomic_fetch_add_u64(..., 1)?
I didn't want counters that wrap at ~4 billion, but I did want to be
able to read and write concurrently without tearing. Instructions
like "lock xadd" would provide more guarantees that I don't need,
since only one thread is doing all the writing and there's no ordering
requirement. It's basically just counter++, but some platforms need a
spinlock to perform atomic read and write of 64 bit wide numbers, so
more hoop jumping is required.
> > /*
> > * smgrprefetch() -- Initiate asynchronous read of the specified block of a relation.
> > + *
> > + * In recovery only, this can return false to indicate that a file
> > + * doesn't exist (presumably it has been dropped by a later WAL
> > + * record).
> > */
> > -void
> > +bool
> > smgrprefetch(SMgrRelation reln, ForkNumber forknum, BlockNumber blocknum)
>
> I think this API, where the behavior of a low-level module changes
> depending on InRecovery, is confusingly crazy. I'd rather have the
> callers specifying whether they're OK with a file that doesn't exist.
Hmm. But... md.c has other code like that. It's true that I'm adding
InRecovery awareness to a function that didn't previously have it, but
that's just because we previously had no reason to prefetch stuff in
recovery.
> > +extern PrefetchBufferResult SharedPrefetchBuffer(SMgrRelation smgr_reln,
> > + ForkNumber forkNum,
> > + BlockNumber blockNum);
> > extern void PrefetchBuffer(Relation reln, ForkNumber forkNum,
> > BlockNumber blockNum);
>
> Umm, I would keep the return values of both these functions in sync.
> It's really strange that PrefetchBuffer does not return
> PrefetchBufferResult, don't you think?
Agreed, and changed. I suspect that other users of the main
PrefetchBuffer() call will eventually want that, to do a better job of
keeping the request queue full, for example bitmap heap scan and
(hypothetical) btree scan with prefetch.
Attachment | Content-Type | Size |
---|---|---|
0001-Allow-PrefetchBuffer-to-be-called-with-a-SMgrRela-v4.patch | text/x-patch | 7.1 KB |
0002-Rename-GetWalRcvWriteRecPtr-to-GetWalRcvFlushRecP-v4.patch | text/x-patch | 10.5 KB |
0003-Add-WalRcvGetWriteRecPtr-new-definition-v4.patch | text/x-patch | 3.7 KB |
0004-Allow-PrefetchBuffer-to-report-what-happened-v4.patch | text/x-patch | 9.6 KB |
0005-Prefetch-referenced-blocks-during-recovery-v4.patch | text/x-patch | 46.3 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Michael Paquier | 2020-03-17 06:37:49 | Re: Collation versioning |
Previous Message | Amit Kapila | 2020-03-17 06:21:09 | Re: error context for vacuum to include block number |