From: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> |
---|---|
To: | Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> |
Cc: | Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions |
Date: | 2019-10-03 09:13:09 |
Message-ID: | CAA4eK1LnO=Zf_wb0TbfJzXidxbijOtGoVb2FwJ4DH+Bt1JCbVg@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Thu, Oct 3, 2019 at 4:03 AM Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
wrote:
> On Wed, Oct 02, 2019 at 04:27:30AM +0530, Amit Kapila wrote:
> >On Tue, Oct 1, 2019 at 7:21 PM Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com
> >
> >wrote:
> >
> >> On Tue, Oct 01, 2019 at 06:55:52PM +0530, Amit Kapila wrote:
> >> >
> >> >On further testing, I found that the patch seems to have problems with
> >> >toast. Consider below scenario:
> >> >Session-1
> >> >Create table large_text(t1 text);
> >> >INSERT INTO large_text
> >> >SELECT (SELECT string_agg('x', ',')
> >> >FROM generate_series(1, 1000000)) FROM generate_series(1, 1000);
> >> >
> >> >Session-2
> >> >SELECT * FROM pg_create_logical_replication_slot('regression_slot',
> >> >'test_decoding');
> >> >SELECT * FROM pg_logical_slot_get_changes('regression_slot', NULL,
> NULL);
> >> >*--kaboom*
> >> >
> >> >The second statement in Session-2 leads to a crash.
> >> >
> >>
> >> OK, thanks for the report - will investigate.
> >>
> >
> >It was an assertion failure in ReorderBufferCleanupTXN at below line:
> >+ /* Check we're not mixing changes from different transactions. */
> >+ Assert(change->txn == txn);
> >
>
> Can you still reproduce this issue with the patch I sent on 28/9? I have
> been unable to trigger the failure, and it seems pretty similar to the
> failure you reported (and I fixed) on 28/9.
>
Yes, it seems we need a similar change in ReorderBufferAddNewTupleCids. I
think in session-2 you need to create replication slot before creating
table in session-1 to see this problem.
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -2196,6 +2196,7 @@ ReorderBufferAddNewTupleCids(ReorderBuffer *rb,
TransactionId xid,
change->data.tuplecid.cmax = cmax;
change->data.tuplecid.combocid = combocid;
change->lsn = lsn;
+ change->txn = txn;
change->action = REORDER_BUFFER_CHANGE_INTERNAL_TUPLECID;
dlist_push_tail(&txn->tuplecids, &change->node);
Few more comments:
-----------------------------------
1.
+static bool
+check_logical_decoding_work_mem(int *newval, void **extra, GucSource
source)
+{
+ /*
+ * -1 indicates fallback.
+ *
+ * If we haven't yet changed the boot_val default of -1, just let it be.
+ * logical decoding will look to maintenance_work_mem instead.
+ */
+ if (*newval == -1)
+ return true;
+
+ /*
+ * We clamp manually-set values to at least 64kB. The maintenance_work_mem
+ * uses a higher minimum value (1MB), so this is OK.
+ */
+ if (*newval < 64)
+ *newval = 64;
I think this needs to be changed as now we don't rely on
maintenance_work_mem. Another thing related to this is that I think the
default value for logical_decoding_work_mem still seems to be -1. We need
to make it to 64MB. I have seen this while debugging memory accounting
changes. I think this is the reason why I was not seeing toast related
changes being serialized because, in that test, I haven't changed the
default value of logical_decoding_work_mem.
2.
+ /*
+ * We're going modify the size of the change, so to make sure the
+ * accounting is correct we'll make it look like we're removing the
+ * change now (with the old size), and then re-add it at the end.
+ */
/going modify/going to modify/
3.
+ *
+ * While updating the existing change with detoasted tuple data, we need to
+ * update the memory accounting info, because the change size will differ.
+ * Otherwise the accounting may get out of sync, triggering serialization
+ * at unexpected times.
+ *
+ * We simply subtract size of the change before rejiggering the tuple, and
+ * then adding the new size. This makes it look like the change was removed
+ * and then added back, except it only tweaks the accounting info.
+ *
+ * In particular it can't trigger serialization, which would be pointless
+ * anyway as it happens during commit processing right before handing
+ * the change to the output plugin.
*/
static void
ReorderBufferToastReplace(ReorderBuffer *rb, ReorderBufferTXN *txn,
@@ -3023,6 +3281,13 @@ ReorderBufferToastReplace(ReorderBuffer *rb,
ReorderBufferTXN *txn,
if (txn->toast_hash == NULL)
return;
+ /*
+ * We're going modify the size of the change, so to make sure the
+ * accounting is correct we'll make it look like we're removing the
+ * change now (with the old size), and then re-add it at the end.
+ */
+ ReorderBufferChangeMemoryUpdate(rb, change, false);
It is not very clear why this change is required. Basically, this is done
at commit time after which actually we shouldn't attempt to spill these
changes. This is mentioned in comments as well, but it is not clear if
that is the case, then how and when accounting can create a problem. If
possible, can you explain it with an example?
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
From | Date | Subject | |
---|---|---|---|
Next Message | Alexey Kondratov | 2019-10-03 09:43:37 | Re: Two pg_rewind patches (auto generate recovery conf and ensure clean shutdown) |
Previous Message | Ashutosh Sharma | 2019-10-03 08:58:25 | Re: pgbench - allow to create partitioned tables |