Re: snapbuild woes

From: Andres Freund <andres(at)anarazel(dot)de>
To: Petr Jelinek <petr(dot)jelinek(at)2ndquadrant(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Erik Rijkers <er(at)xs4all(dot)nl>, Robert Haas <robertmhaas(at)gmail(dot)com>, Craig Ringer <craig(at)2ndquadrant(dot)com>
Subject: Re: snapbuild woes
Date: 2017-04-13 00:29:59
Message-ID: 20170413002959.apytyi5lkknauauj@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,
On 2017-03-03 01:30:11 +0100, Petr Jelinek wrote:

> From 7d5b48c8cb80e7c867b2096c999d08feda50b197 Mon Sep 17 00:00:00 2001
> From: Petr Jelinek <pjmodos(at)pjmodos(dot)net>
> Date: Fri, 24 Feb 2017 21:39:03 +0100
> Subject: [PATCH 1/5] Reserve global xmin for create slot snasphot export
>
> Otherwise the VACUUM or pruning might remove tuples still needed by the
> exported snapshot.
> ---
> src/backend/replication/logical/logical.c | 21 ++++++++++++++++++++-
> 1 file changed, 20 insertions(+), 1 deletion(-)
>
> diff --git a/src/backend/replication/logical/logical.c b/src/backend/replication/logical/logical.c
> index 5529ac8..57c392c 100644
> --- a/src/backend/replication/logical/logical.c
> +++ b/src/backend/replication/logical/logical.c
> @@ -267,12 +267,18 @@ CreateInitDecodingContext(char *plugin,
> * the slot machinery about the new limit. Once that's done the
> * ProcArrayLock can be released as the slot machinery now is
> * protecting against vacuum.
> + *
> + * Note that we only store the global xmin temporarily in the in-memory
> + * state so that the initial snapshot can be exported. After initial
> + * snapshot is done global xmin should be reset and not tracked anymore
> + * so we are fine with losing the global xmin after crash.
> * ----
> */
> LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
>
> slot->effective_catalog_xmin = GetOldestSafeDecodingTransactionId();
> slot->data.catalog_xmin = slot->effective_catalog_xmin;
> + slot->effective_xmin = slot->effective_catalog_xmin;

> void
> FreeDecodingContext(LogicalDecodingContext *ctx)
> {
> + ReplicationSlot *slot = MyReplicationSlot;
> +
> if (ctx->callbacks.shutdown_cb != NULL)
> shutdown_cb_wrapper(ctx);
>
> + /*
> + * Cleanup global xmin for the slot that we may have set in
> + * CreateInitDecodingContext().

Hm. Is that actually a meaningful point to do so? For one, it gets
called by pg_logical_slot_get_changes_guts(), but more importantly, the
snapshot is exported till SnapBuildClearExportedSnapshot(), which is the
next command? If we rely on the snapshot magic done by ExportSnapshot()
it'd be worthwhile to mention that...

> We do not take ProcArrayLock or similar
> + * since we only reset xmin here and there's not much harm done by a
> + * concurrent computation missing that.
> + */

Hum. I was prepared to complain about this, but ISTM, that there's
absolutely no risk because the following
ReplicationSlotsComputeRequiredXmin(false); actually does all the
necessary locking? But still, I don't see much point in the
optimization.

> This patch changes the code so that stored snapshots are only used for
> logical decoding restart but not for initial slot snapshot.

Yea, that's a very good point...

> @@ -1284,13 +1286,13 @@ SnapBuildFindSnapshot(SnapBuild *builder, XLogRecPtr lsn, xl_running_xacts *runn
>
> return false;
> }
> - /* c) valid on disk state */
> - else if (SnapBuildRestore(builder, lsn))
> + /* c) valid on disk state and not exported snapshot */
> + else if (!TransactionIdIsNormal(builder->initial_xmin_horizon) &&
> + SnapBuildRestore(builder, lsn))

Hm. Is this a good signaling mechanism? It'll also trigger for the SQL
interface, where it'd strictly speaking not be required, right?

> From 3318a929e691870f3c1ca665bec3bfa8ea2af2a8 Mon Sep 17 00:00:00 2001
> From: Petr Jelinek <pjmodos(at)pjmodos(dot)net>
> Date: Sun, 26 Feb 2017 01:07:33 +0100
> Subject: [PATCH 3/5] Prevent snapshot builder xmin from going backwards

A bit more commentary would be good. What does that protect us against?

> From 53193b40f26dd19c712f3b9b77af55f81eb31cc4 Mon Sep 17 00:00:00 2001
> From: Petr Jelinek <pjmodos(at)pjmodos(dot)net>
> Date: Wed, 22 Feb 2017 00:57:33 +0100
> Subject: [PATCH 4/5] Fix xl_running_xacts usage in snapshot builder
>
> Due to race condition, the xl_running_xacts might contain no longer
> running transactions. Previous coding tried to get around this by
> additional locking but that did not work correctly for committs. Instead
> try combining decoded commits and multiple xl_running_xacts to get the
> consistent snapshot.

Needs more explanation about approach.

> @@ -1221,7 +1221,12 @@ SnapBuildFindSnapshot(SnapBuild *builder, XLogRecPtr lsn, xl_running_xacts *runn
> * simply track the number of in-progress toplevel transactions and
> * lower it whenever one commits or aborts. When that number
> * (builder->running.xcnt) reaches zero, we can go from FULL_SNAPSHOT
> - * to CONSISTENT.
> + * to CONSISTENT. Sometimes we might get xl_running_xacts which has
> + * all tracked transactions as finished. We'll need to restart tracking
> + * in that case and use previously collected committed transactions to
> + * purge transactions mistakenly marked as running in the
> + * xl_running_xacts which exist as a result of race condition in
> + * LogStandbySnapshot().

I'm not following this yet.

> @@ -1298,11 +1303,17 @@ SnapBuildFindSnapshot(SnapBuild *builder, XLogRecPtr lsn, xl_running_xacts *runn
> * b) first encounter of a useable xl_running_xacts record. If we had
> * found one earlier we would either track running transactions (i.e.
> * builder->running.xcnt != 0) or be consistent (this function wouldn't
> - * get called).
> + * get called). However it's possible that we could not see all
> + * transactions that were marked as running in xl_running_xacts, so if
> + * we get new one that says all were closed but we are not consistent
> + * yet, we need to restart the tracking while taking previously seen
> + * transactions into account.

This needs to revise the preceding comment more heavily. "This is the
first!!! Or maybe not!" isn't easy to understand.

> */
> - else if (!builder->running.xcnt)
> + else if (!builder->running.xcnt ||
> + running->oldestRunningXid > builder->running.xmax)

Isn't that wrong under wraparound?

> {
> int off;
> + bool first = builder->running.xcnt == 0;
>
> /*
> * We only care about toplevel xids as those are the ones we
> @@ -1338,20 +1349,13 @@ SnapBuildFindSnapshot(SnapBuild *builder, XLogRecPtr lsn, xl_running_xacts *runn
> builder->running.xmin = builder->running.xip[0];
> builder->running.xmax = builder->running.xip[running->xcnt - 1];
>
> +
> /* makes comparisons cheaper later */
> TransactionIdRetreat(builder->running.xmin);
> TransactionIdAdvance(builder->running.xmax);
>
> builder->state = SNAPBUILD_FULL_SNAPSHOT;
>
> - ereport(LOG,
> - (errmsg("logical decoding found initial starting point at %X/%X",
> - (uint32) (lsn >> 32), (uint32) lsn),
> - errdetail_plural("%u transaction needs to finish.",
> - "%u transactions need to finish.",
> - builder->running.xcnt,
> - (uint32) builder->running.xcnt)));
> -
> + /*
> + * If this is the first time we've seen xl_running_xacts, we are done.
> + */
> + if (first)
> + {
> + ereport(LOG,
> + (errmsg("logical decoding found initial starting point at %X/%X",
> + (uint32) (lsn >> 32), (uint32) lsn),
> + errdetail_plural("%u transaction needs to finish.",
> + "%u transactions need to finish.",
> + builder->running.xcnt,
> + (uint32) builder->running.xcnt)));
> + }
> + else
> + {
> + /*
> + * Because of the race condition in LogStandbySnapshot() the
> + * transactions recorded in xl_running_xacts as running might have
> + * already committed by the time the xl_running_xacts was written
> + * to WAL. Use the information about decoded transactions that we
> + * gathered so far to update our idea about what's still running.
> + *
> + * We can use SnapBuildEndTxn directly as it only does the
> + * transaction running check and handling without any additional
> + * side effects.
> + */
> + for (off = 0; off < builder->committed.xcnt; off++)
> + SnapBuildEndTxn(builder, lsn, builder->committed.xip[off]);
> + if (builder->state == SNAPBUILD_CONSISTENT)
> + return false;
> +
> + ereport(LOG,
> + (errmsg("logical decoding moved initial starting point to %X/%X",
> + (uint32) (lsn >> 32), (uint32) lsn),
> + errdetail_plural("%u transaction needs to finish.",
> + "%u transactions need to finish.",
> + builder->running.xcnt,
> + (uint32) builder->running.xcnt)));
> + }

Hm, this is not pretty.

> From 4217da872e9aa48750c020542d8bc22c863a3d75 Mon Sep 17 00:00:00 2001
> From: Petr Jelinek <pjmodos(at)pjmodos(dot)net>
> Date: Tue, 21 Feb 2017 19:58:18 +0100
> Subject: [PATCH 5/5] Skip unnecessary snapshot builds
>
> When doing initial snapshot build during logical decoding
> initialization, don't build snapshots for transactions where we know the
> transaction didn't do any catalog changes. Otherwise we might end up
> with thousands of useless snapshots on busy server which can be quite
> expensive.
> ---
> src/backend/replication/logical/snapbuild.c | 82 +++++++++++++++++++----------
> 1 file changed, 53 insertions(+), 29 deletions(-)
>
> diff --git a/src/backend/replication/logical/snapbuild.c b/src/backend/replication/logical/snapbuild.c
> index 1a1c9ba..c800aa5 100644
> --- a/src/backend/replication/logical/snapbuild.c
> +++ b/src/backend/replication/logical/snapbuild.c
> @@ -954,6 +954,7 @@ SnapBuildCommitTxn(SnapBuild *builder, XLogRecPtr lsn, TransactionId xid,
> bool forced_timetravel = false;
> bool sub_needs_timetravel = false;
> bool top_needs_timetravel = false;
> + bool skip_forced_snapshot = false;
>
> TransactionId xmax = xid;
>
> @@ -975,10 +976,19 @@ SnapBuildCommitTxn(SnapBuild *builder, XLogRecPtr lsn, TransactionId xid,
> /*
> * We could avoid treating !SnapBuildTxnIsRunning transactions as
> * timetravel ones, but we want to be able to export a snapshot when
> - * we reached consistency.
> + * we reached consistency so we need to keep track of them.
> */
> forced_timetravel = true;
> elog(DEBUG1, "forced to assume catalog changes for xid %u because it was running too early", xid);
> +
> + /*
> + * It is however desirable to skip building new snapshot for
> + * !SnapBuildTxnIsRunning transactions as otherwise we might end up
> + * building thousands of unused snapshots on busy servers which can
> + * be very expensive.
> + */
> + if (!SnapBuildTxnIsRunning(builder, xid))
> + skip_forced_snapshot = true;
> }

That's pretty crudely bolted on the existing logic, isn't there a
simpler way?

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Langote 2017-04-13 00:35:07 Re: Partitioned tables and relfilenode
Previous Message Peter Eisentraut 2017-04-13 00:15:52 Re: Function to control physical replication slot