From: | Peter Smith <smithpb2250(at)gmail(dot)com> |
---|---|
To: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> |
Cc: | Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Petr Jelinek <petr(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Henry Hinze <henry(dot)hinze(at)gmail(dot)com>, PostgreSQL mailing lists <pgsql-bugs(at)lists(dot)postgresql(dot)org>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com> |
Subject: | Re: BUG #16643: PG13 - Logical replication - initial startup never finishes and gets stuck in startup loop |
Date: | 2020-11-25 05:49:51 |
Message-ID: | CAHut+Ps2W7KP46zFQ9ybEckJoRKpT6zh6aPj31L2RV9qBcoLRg@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs |
On Mon, Nov 23, 2020 at 8:43 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> Peter, can you also please once test the attached and see if this
> fixes the problem for you as well?
I have reviewed the v3 patch code, and repeated the tablesync testing
(for a stream-enabled SUBSCRIPTION) using the debugging technique
previously described [1].
https://www.postgresql.org/message-id/CAHut%2BPt4PyKQCwqzQ%3DEFF%3DbpKKJD7XKt_S23F6L20ayQNxg77A%40mail.gmail.com
Now the test was successful. The stream start/stop/commit/abort are
all being handled by the tablesync worker without errors.
===
TESTING
I performed the tablesync stream test by multiple different ways of
inserting the en masse data:
1. Cause streaming (using a PREPARE TRANSACTION)
------------------------------------------------
psql -d test_pub -c "BEGIN;INSERT INTO test_tab SELECT i, md5(i::text)
FROM generate_series(3, 5000) s(i);PREPARE TRANSACTION
'test_prepared_tab';"
psql -d test_pub -c "COMMIT PREPARED 'test_prepared_tab';"
tablesync does 'S' (stream start)
tablesync does 'R'
tablesync does 'I' (insert) x n
tablesync does 'E' (stream end)
tablesync does 'S'
tablesync does 'I' x n
tablesync does 'E'
tablesync does { 'S', 'I' x n, 'E' } repeats 13 more times
tablesync does 'c' (stream commit)
- In handle_stream_commit the tablesync worker processes the spool
file, calling apply_dispatch for all the messages
- 'R'
- { 'I' + should_apply_changes_for_rel = true } x 5000
- then calls process_syncing_tables_for_sync, which sets the state to
SUBREL_STATE_SYNCDONE
- then tablesync worker process exits
Now the "apply" worker catches up.
- it runs all the same messages again but does nothing because
should_apply_changes_for_rel = false
2. Cause streaming (not using a PREPARE TRANSACTION)
----------------------------------------------------
psql -d test_pub -c "INSERT INTO test_tab SELECT i, md5(i::text) FROM
generate_series(3, 5000) s(i);"
gives very similar results to above
3. Causing a stream abort to happen
-----------------------------------
psql -d test_pub -c "BEGIN; INSERT INTO test_tab SELECT i,
md5(i::text) FROM generate_series(3, 5000) s(i); INSERT INTO test_tab
VALUES(4999, 'already exists'); END;"
tablesync does 'S' (stream start)
tablesync does 'E' (stream end)
tablesync does 'A' (stream abort)
tablesync does 'B'
tablesync does 'C'
- calls process_syncing_tables
- tablesync worker process exits
apply worker continues...
===
REVIEW COMMENTS
Despite the tests working OK I thought there should be a couple small
changes made for this patch, as follows.
(1) process_sync_tables should be last
IMO the process_syncing_tables should be always the *last* thing any
apply handler does, because it seems a bad idea for the worker to
change state (e.g. to say it is SUBREL_STATE_SYNCDONE) if in fact
there is still a little bit more processing remaining. This is why I
made the same modification in the other WIP patch v26-0006 [2]
[2] https://www.postgresql.org/message-id/CAHut%2BPuQcLbjrBXM54%2By%2B%3DYsmEDFd3CCtp31K-_jS4Xka2vwbQ%40mail.gmail.com
So, I think the process_syncing_tables() call should be put *after*
the stream_cleanup_files() in function apply_handle_stream_commit().
But to do this means the process_syncing_tables has to be removed from
your new apply_handle_commit_internal function.
(2) misleading comment
/*
* Start a transaction on stream start, this transaction will be committed
* on the stream stop. We need the transaction for handling the buffile,
* used for serializing the streaming data and subxact info.
*/
That above comment (in the apply_handle_stream_start function) may now
be inaccurate/misleading for the case of tablesync worker, because I
think for tablesync you are explicitly avoiding the tx commit within
the apply_handle_stream_stop().
---
Kind Regards,
Peter Smith.
Fujitsu Australia
From | Date | Subject | |
---|---|---|---|
Next Message | David Turoň | 2020-11-25 07:46:32 | Re: BUG #16743: psql doesn't show whole expression in stored column |
Previous Message | Manoj Kumar | 2020-11-25 04:39:37 | Re: BUG #16739: Temporary files not deleting from data folder on disk |