From: | Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com> |
---|---|
To: | noah(at)leadboat(dot)com |
Cc: | pgsql-hackers(at)postgresql(dot)org, 9erthalion6(at)gmail(dot)com, andrew(dot)dunstan(at)2ndquadrant(dot)com, hlinnaka(at)iki(dot)fi, robertmhaas(at)gmail(dot)com, michael(at)paquier(dot)xyz |
Subject: | Re: [HACKERS] WAL logging problem in 9.4.3? |
Date: | 2019-08-19 09:59:59 |
Message-ID: | 20190819.185959.118543656.horikyota.ntt@gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Thank you for taking time.
At Sat, 17 Aug 2019 20:52:30 -0700, Noah Misch <noah(at)leadboat(dot)com> wrote in <20190818035230(dot)GB3021338(at)rfd(dot)leadboat(dot)com>
> For two-phase commit, PrepareTransaction() needs to execute pending syncs.
Now TwoPhaseFileHeader has two new members for (commit-time)
pending syncs. Pending-syncs are useless on wal-replay, but that
is needed for commit-prepared.
> On Thu, Jul 25, 2019 at 10:39:36AM +0900, Kyotaro Horiguchi wrote:
> > --- a/src/backend/access/heap/heapam_handler.c
> > +++ b/src/backend/access/heap/heapam_handler.c
> > @@ -715,12 +702,6 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
...
> > - use_wal = XLogIsNeeded() && RelationNeedsWAL(NewHeap);
> > -
> > /* use_wal off requires smgr_targblock be initially invalid */
> > Assert(RelationGetTargetBlock(NewHeap) == InvalidBlockNumber);
>
> Since you're deleting the use_wal variable, update that last comment.
Oops. Rewrote it.
> > --- a/src/backend/catalog/storage.c
> > +++ b/src/backend/catalog/storage.c
> > @@ -428,21 +450,34 @@ smgrDoPendingDeletes(bool isCommit)
...
> > + smgrimmedsync(srel, MAIN_FORKNUM);
>
> This should sync all forks, not just MAIN_FORKNUM. Code that writes WAL for
> FSM_FORKNUM and VISIBILITYMAP_FORKNUM checks RelationNeedsWAL(). There may be
> no bug today, but it's conceptually wrong to make RelationNeedsWAL() return
> false due to this code, use RelationNeedsWAL() for multiple forks, and then
> not actually sync all forks.
I agree that all forks needs syncing, but FSM and VM are checking
RelationNeedsWAL(modified). To make sure, are you suggesting to
sync all forks instead of emitting WAL for them, or suggesting
that VM and FSM to emit WALs even when the modified
RelationNeedsWAL returns false (+ sync all forks)?
> The https://postgr.es/m/559FA0BA.3080808@iki.fi design had another component
> not appearing here. It said, "Instead, at COMMIT, we'd fsync() the relation,
> or if it's smaller than some threshold, WAL-log the contents of the whole file
> at that point." Please write the part to WAL-log the contents of small files
> instead of syncing them.
I'm not sure the point of the behavior. I suppose that the "log"
is a sequence of new_page records. It also needs to be synced and
it is always larger than the file to be synced. I can't think of
an appropriate threshold without the point.
> > --- a/src/backend/commands/copy.c
> > +++ b/src/backend/commands/copy.c
> > @@ -2725,28 +2722,9 @@ CopyFrom(CopyState cstate)
> > * If it does commit, we'll have done the table_finish_bulk_insert() at
> > * the bottom of this routine first.
> > *
> > - * As mentioned in comments in utils/rel.h, the in-same-transaction test
> > - * is not always set correctly, since in rare cases rd_newRelfilenodeSubid
> > - * can be cleared before the end of the transaction. The exact case is
> > - * when a relation sets a new relfilenode twice in same transaction, yet
> > - * the second one fails in an aborted subtransaction, e.g.
> > - *
> > - * BEGIN;
> > - * TRUNCATE t;
> > - * SAVEPOINT save;
> > - * TRUNCATE t;
> > - * ROLLBACK TO save;
> > - * COPY ...
>
> The comment material being deleted is still correct, so don't delete it.
> Moreover, the code managing rd_firstRelfilenodeSubid has a similar bug. The
> attached patch adds an assertion that RelationNeedsWAL() and the
> pendingDeletes array have the same opinion about the relfilenode, and it
> expands a test case to fail that assertion.
(Un?)Fortunately, that doesn't fail.. (with rebased version on
the recent master) I'll recheck that tomorrow.
> > --- a/src/include/utils/rel.h
> > +++ b/src/include/utils/rel.h
> > @@ -74,11 +74,13 @@ typedef struct RelationData
> > SubTransactionId rd_createSubid; /* rel was created in current xact */
> > SubTransactionId rd_newRelfilenodeSubid; /* new relfilenode assigned in
> > * current xact */
> > + SubTransactionId rd_firstRelfilenodeSubid; /* new relfilenode assigned
> > + * first in current xact */
>
> In general, to add a field like this, run "git grep -n 'rd_.*Subid'" and audit
> all the lines printed. Many bits of code need to look at all three,
> e.g. RelationClose().
Agreed. I'll recheck that.
> This field needs to be 100% reliable. In other words,
> it must equal InvalidSubTransactionId if and only if the relfilenode matches
> the relfilenode that would be in place if the top transaction rolled back.
I don't get this. I think the variable moves as you suggested. It
is handled same way with fd_new* in AtEOSubXact_cleanup but the
difference is in assignment but rollback. rd_fist* won't change
after the first assignment so rollback of the subid means
relfilenode is also rolled back to the initial value at the
beginning of the top transaction.
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center
From | Date | Subject | |
---|---|---|---|
Next Message | Erik Rijkers | 2019-08-19 10:55:17 | Re: FETCH FIRST clause PERCENT option |
Previous Message | Surafel Temesgen | 2019-08-19 09:18:39 | Re: FETCH FIRST clause PERCENT option |