From: | Andres Freund <andres(at)anarazel(dot)de> |
---|---|
To: | Alvaro Herrera <alvherre(at)2ndquadrant(dot)com> |
Cc: | pgsql-hackers(at)lists(dot)postgresql(dot)org, Jaime Casanova <jaime(dot)casanova(at)2ndquadrant(dot)com> |
Subject: | Re: LogwrtResult contended spinlock |
Date: | 2021-01-30 02:30:11 |
Message-ID: | 20210130023011.n545o54j65t4kgxn@alap3.anarazel.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
On 2021-01-29 22:47:56 -0300, Alvaro Herrera wrote:
> So I tried this, but -- perhaps not suprisingly -- I can't get it to
> work properly; the synchronization fails.
What do you mean by "synchronization fails"?
> Strangely, all tests work for me, but the pg_upgrade one in particular
> fails.
It's one of the few tests using fsync=on.
> +static inline void
> +pg_atomic_monotonic_advance_u64(volatile pg_atomic_uint64 *ptr, uint64 target_)
> +{
> +#ifndef PG_HAVE_ATOMIC_U64_SIMULATION
> + AssertPointerAlignment(ptr, 8);
> +#endif
> + /* FIXME is this algorithm correct if we have u64 simulation? */
I don't see a problem.
> + while (true)
> + {
> + uint64 currval;
> +
> + currval = pg_atomic_read_u64(ptr);
> + if (currval > target_)
> + break; /* already done by somebody else */
> + if (pg_atomic_compare_exchange_u64(ptr, &currval, target_))
> + break; /* successfully done */
> + }
> +}
I suggest writing this as
currval = pg_atomic_read_u64(ptr);
while (currval < target_)
{
if (pg_atomic_compare_exchange_u64(ptr, &currval, target_))
break;
}
> /*
> * Inserting to WAL is protected by a small fixed number of WAL insertion
> @@ -596,8 +599,10 @@ typedef struct XLogCtlData
> {
> XLogCtlInsert Insert;
>
> + XLogwrtAtomic LogwrtRqst;
> + XLogwrtAtomic LogwrtResult;
> +
> /* Protected by info_lck: */
> - XLogwrtRqst LogwrtRqst;
Not sure putting these into the same cacheline is a good idea.
> @@ -2166,12 +2163,11 @@ AdvanceXLInsertBuffer(XLogRecPtr upto, bool opportunistic)
> if (opportunistic)
> break;
>
> - /* Before waiting, get info_lck and update LogwrtResult */
> - SpinLockAcquire(&XLogCtl->info_lck);
> - if (XLogCtl->LogwrtRqst.Write < OldPageRqstPtr)
> - XLogCtl->LogwrtRqst.Write = OldPageRqstPtr;
> - LogwrtResult = XLogCtl->LogwrtResult;
> - SpinLockRelease(&XLogCtl->info_lck);
> + /* Before waiting, update LogwrtResult */
> + pg_atomic_monotonic_advance_u64(&XLogCtl->LogwrtRqst.Write, OldPageRqstPtr);
> +
> + LogwrtResult.Write = pg_atomic_read_u64(&XLogCtl->LogwrtResult.Write);
> + LogwrtResult.Flush = pg_atomic_read_u64(&XLogCtl->LogwrtResult.Flush);
I don't think it's quite as easy as this. Write/Flush now aren't
guaranteed to be coherent with each other - previously they were. And
because it's in a global variable used everywhere, we can't just be more
careful about update protocols in one place...
We also shouldn't re-read a variable that we just did via the
pg_atomic_monotonic_advance_u64().
I think we should stop updating both the Write/Flush position at the
same time. That way we don't have an expectation of them being coherent
with each other. Most places really don't need both anyway.
> {
> - SpinLockAcquire(&XLogCtl->info_lck);
> - XLogCtl->LogwrtResult = LogwrtResult;
> - if (XLogCtl->LogwrtRqst.Write < LogwrtResult.Write)
> - XLogCtl->LogwrtRqst.Write = LogwrtResult.Write;
> - if (XLogCtl->LogwrtRqst.Flush < LogwrtResult.Flush)
> - XLogCtl->LogwrtRqst.Flush = LogwrtResult.Flush;
> - SpinLockRelease(&XLogCtl->info_lck);
> + pg_atomic_monotonic_advance_u64(&XLogCtl->LogwrtResult.Write, LogwrtResult.Write);
> + pg_atomic_monotonic_advance_u64(&XLogCtl->LogwrtResult.Flush, LogwrtResult.Flush);
> +
> + pg_atomic_monotonic_advance_u64(&XLogCtl->LogwrtRqst.Write, LogwrtResult.Write);
> + pg_atomic_monotonic_advance_u64(&XLogCtl->LogwrtRqst.Flush, LogwrtResult.Flush);
Hm. We went from one cheap atomic operation (SpinLockAcquire) to four
expensive ones in the happy path. That's not free...
I don't think we need to manipulate XLogCtl->LogwrtResult.* using atomic
ops - they can only be updated with WALWriteLock held, right?
XLogCtl->LogwrtResult was updated with plain assignment before, why did
you change it to pg_atomic_monotonic_advance_u64()?
> @@ -1169,6 +1169,8 @@ XLogInsertRecord(XLogRecData *rdata,
> {
> /* advance global request to include new block(s) */
> pg_atomic_monotonic_advance_u64(&XLogCtl->LogwrtRqst.Write, EndPos);
> + pg_memory_barrier();
> +
That's not really useful - the path that actually updates already
implies a barrier. It'd probably be better to add a barrier to a "never
executed cmpxchg" fastpath.
> @@ -2905,6 +2909,7 @@ XLogFlush(XLogRecPtr record)
> WriteRqstPtr = pg_atomic_read_u64(&XLogCtl->LogwrtRqst.Write);
> LogwrtResult.Write = pg_atomic_read_u64(&XLogCtl->LogwrtResult.Write);
> LogwrtResult.Flush = pg_atomic_read_u64(&XLogCtl->LogwrtResult.Flush);
> + pg_read_barrier();
I'm not sure you really can get away with just a read barrier in these
places. We can't e.g. have later updates to other shared memory
variables be "moved" to before the barrier (which a read barrier
allows).
Greetings,
Andres Freund
From | Date | Subject | |
---|---|---|---|
Next Message | Michael Paquier | 2021-01-30 02:33:36 | Re: Should we make Bitmapsets a kind of Node? |
Previous Message | Michael Paquier | 2021-01-30 02:23:01 | Re: Allow CLUSTER, VACUUM FULL and REINDEX to change tablespace on the fly |