From: | Peter Smith <smithpb2250(at)gmail(dot)com> |
---|---|
To: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> |
Cc: | Craig Ringer <craig(dot)ringer(at)enterprisedb(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Petr Jelinek <petr(dot)jelinek(at)enterprisedb(dot)com>, Petr Jelinek <petr(at)2ndquadrant(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com> |
Subject: | Re: Single transaction in the tablesync worker? |
Date: | 2020-12-08 06:22:49 |
Message-ID: | CAHut+Ptjk-Qgd3R1a1_tr62CmiswcYphuv0pLmVA-+2s8r0Bkw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Mon, Dec 7, 2020 at 7:49 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> As the slot of apply worker is created before all the tablesync
> workers it should never miss any LSN which tablesync workers would
> have processed. Also, the table sync workers should not process any
> xact if the apply worker has not processed anything. I think tablesync
> currently always processes one transaction (because we call
> process_sync_tables at commit of a txn) even if that is not required
> to be in sync with the apply worker. This should solve both the
> problems (a) visibility of partial transactions (b) allow prepared
> transactions because tablesync worker no longer needs to combine
> multiple transactions data.
>
> I think the other advantages of this would be that it would reduce the
> load (both CPU and I/O) on the publisher-side by allowing to decode
> the data only once instead of for each table sync worker once and
> separately for the apply worker. I think it will use fewer resources
> to finish the work.
Yes, I observed this same behavior.
IIUC the only way for the tablesync worker to go from CATCHUP mode to
SYNCDONE is via the call to process_sync_tables.
But a side-effect of this is, when messages arrive during this CATCHUP
phase one tx will be getting handled by the tablesync worker before
the process_sync_tables() is ever encountered.
I have created and attached a simple patch which allows the tablesync
to detect if there is anything to do *before* it enters the apply main
loop. Calling process_sync_tables() before the apply main loop offers
a quick way out so the message handling will not be split
unnecessarily between the workers.
~
The result of the patch is demonstrated by the following test/logs
which are also attached.
Note: I added more logging (not in this patch) to make it easier to
see what is going on.
LOGS1. Current code.
Test: 10 x INSERTS done at CATCHUP time.
Result: tablesync worker does 1 x INSERT, then apply worker skips 1
and does remaining 9 x INSERTs.
LOGS2. Patched code.
Test: Same 10 x INSERTS done at CATCHUP time.
Result: tablesync can exit early. apply worker handles all 10 x INSERTs
LOGS3. Patched code.
Test: 2PC PREPARE then COMMIT PREPARED [1] done at CATCHUP time
psql -d test_pub -c "BEGIN;INSERT INTO test_tab VALUES(1,
'foo');PREPARE TRANSACTION 'test_prepared_tab';"
psql -d test_pub -c "COMMIT PREPARED 'test_prepared_tab';"
Result: The PREPARE and COMMIT PREPARED are both handle by apply
worker. This avoids complications which the split otherwise causes.
[1] 2PC prepare test requires v29 patch from
https://www.postgresql.org/message-id/flat/CAMGcDxeqEpWj3fTXwqhSwBdXd2RS9jzwWscO-XbeCfso6ts3%2BQ%40mail.gmail.com
---
Kind Regards,
Peter Smith.
Fujitsu Australia
Attachment | Content-Type | Size |
---|---|---|
LOGS1-10xInsertTest.txt | text/plain | 9.8 KB |
tablesync-early-exit.patch | application/octet-stream | 685 bytes |
LOGS3-Patched-PrepareTest.txt | text/plain | 4.0 KB |
LOGS2-Patched-10xInsertTest.txt | text/plain | 3.8 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Andres Freund | 2020-12-08 06:23:19 | Re: Blocking I/O, async I/O and io_uring |
Previous Message | k.jamison@fujitsu.com | 2020-12-08 06:17:52 | RE: [Patch] Optimize dropping of relation buffers using dlist |