From: | Noah Misch <noah(at)leadboat(dot)com> |
---|---|
To: | Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com> |
Cc: | robertmhaas(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org, 9erthalion6(at)gmail(dot)com, andrew(dot)dunstan(at)2ndquadrant(dot)com, hlinnaka(at)iki(dot)fi, michael(at)paquier(dot)xyz |
Subject: | Re: [HACKERS] WAL logging problem in 9.4.3? |
Date: | 2020-03-30 04:41:01 |
Message-ID: | 20200330044101.GA2324620@rfd.leadboat.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
I think attached v41nm is ready for commit. Would anyone like to vote against
back-patching this? It's hard to justify lack of back-patch for a data-loss
bug, but this is atypically invasive. (I'm repeating the question, since some
folks missed my 2020-02-18 question.) Otherwise, I'll push this on Saturday.
On Mon, Mar 23, 2020 at 05:20:27PM +0900, Kyotaro Horiguchi wrote:
> At Sat, 21 Mar 2020 15:49:20 -0700, Noah Misch <noah(at)leadboat(dot)com> wrote in
> > The proximate cause is the RelFileNodeSkippingWAL() call that we added to
> > MarkBufferDirtyHint(). MarkBufferDirtyHint() runs in parallel workers, but
> > parallel workers have zeroes for pendingSyncHash and rd_*Subid.
> > Kyotaro, can you look through the affected code and propose a strategy for
> > good coexistence of parallel query with the WAL skipping mechanism?
>
> Bi-directional communication between leader and workers is too-much.
> It wouldn't be acceptable to inhibit the problematic operations on
> workers such like heap-prune or btree pin removal. If we do pending
> syncs just before worker start, it won't fix the issue.
>
> The attached patch passes a list of pending-sync relfilenodes at
> worker start.
If you were to issue pending syncs and also cease skipping WAL for affected
relations, that would fix the issue. Your design is better, though. I made
two notable changes:
- The patch was issuing syncs or FPIs every time a parallel worker exited. I
changed it to skip most of smgrDoPendingSyncs() in parallel workers, like
AtEOXact_RelationMap() does.
- PARALLEL_KEY_PENDING_SYNCS is most similar to PARALLEL_KEY_REINDEX_STATE and
PARALLEL_KEY_COMBO_CID. parallel.c, not execParallel.c, owns those. I
moved PARALLEL_KEY_PENDING_SYNCS to parallel.c, which also called for style
changes in the associated storage.c functions.
Since pendingSyncHash is always NULL under XLogIsNeeded(), I also removed some
XLogIsNeeded() tests that immediately preceded !pendingSyncHash tests.
Attachment | Content-Type | Size |
---|---|---|
skip-wal-v41nm.tar.gz | application/x-tar-gz | 192.3 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Fujii Masao | 2020-03-30 04:56:43 | Re: Planning counters in pg_stat_statements (using pgss_store) |
Previous Message | Tom Lane | 2020-03-30 04:29:29 | Re: improve transparency of bitmap-only heap scans |