From: | Andres Freund <andres(at)anarazel(dot)de> |
---|---|
To: | Michael Paquier <michael(at)paquier(dot)xyz> |
Cc: | marko(at)joh(dot)to, pgsql-bugs(at)lists(dot)postgresql(dot)org |
Subject: | Re: BUG #15888: Bogus "idle in transaction" state for logical decoding client after creating a slot |
Date: | 2019-07-09 20:15:57 |
Message-ID: | 20190709201557.y5yt57z2za2hu6l7@alap3.anarazel.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs |
Hi,
On 2019-07-04 13:35:12 +0900, Michael Paquier wrote:
> On Wed, Jul 03, 2019 at 08:03:24AM +0000, PG Bug reporting form wrote:
> > Run:
> >
> > pg_recvlogical -S foo -f /dev/null -d dbname --create-slot --start
> >
> > and pg_stat_activity will show something like this until that session
> > disconnects:
> >
> > which is quite obviously bogus.
>
> I think that's the transaction which is started when exporting the
> snapshot at logical slot creation. Or in short:
> #2 0x000055b5f8e5720f in StartTransactionCommand () at xact.c:2699
> #3 0x000055b5f90fab4e in SnapBuildExportSnapshot
> (builder=0x55b5fa3427b0) at snapbuild.c:575
> #4 0x000055b5f90fe5df in CreateReplicationSlot (cmd=0x55b5fa2914e8)
> at walsender.c:843
>
> SnapBuildExportSnapshot() needs to keep a transaction context opened
> to keep the wanted xmin around. In PG >= 10, this problem gets
> "solved" by using NOEXPORT_SNAPSHOT in the CREATE_REPLICATION_SLOT
> command. One idea would be to enforce a commit in this case from
> CreateReplicationSlot/streamutil.c? We know that pg_recvlogical is
> not going to use this snapshot anyway...
The transaction ought to be finished by exec_replication_command()
calling SnapBuildClearExportedSnapshot(), which then does
AbortCurrentTransaction(). So something's wrong if that's not working
anymore.
I just tested this on a slightly older v12 checkout, and the general
mechanism works:
S1(replication):
postgres[13464][1]=# CREATE_REPLICATION_SLOT foo LOGICAL test_decoding;
S2(plain):
postgres[13317][1]=# SELECT state, wait_event_type, wait_event, backend_xid, backend_xmin FROM pg_stat_activity WHERE pid = 13464;
┌─────────────────────┬─────────────────┬────────────┬─────────────┬──────────────┐
│ state │ wait_event_type │ wait_event │ backend_xid │ backend_xmin │
├─────────────────────┼─────────────────┼────────────┼─────────────┼──────────────┤
│ idle in transaction │ Client │ ClientRead │ (null) │ 1494339463 │
└─────────────────────┴─────────────────┴────────────┴─────────────┴──────────────┘
(1 row)
S1(replication):
postgres[13464][1]=# IDENTIFY_SYSTEM;
┌─────────────────────┬──────────┬──────────────┬──────────┐
│ systemid │ timeline │ xlogpos │ dbname │
├─────────────────────┼──────────┼──────────────┼──────────┤
│ 6704433050907079638 │ 1 │ 3AC/1A25B700 │ postgres │
└─────────────────────┴──────────┴──────────────┴──────────┘
S2(plain):
postgres[13317][1]=# SELECT state, wait_event_type, wait_event, backend_xid, backend_xmin FROM pg_stat_activity WHERE pid = 13464;
┌───────┬─────────────────┬────────────┬─────────────┬──────────────┐
│ state │ wait_event_type │ wait_event │ backend_xid │ backend_xmin │
├───────┼─────────────────┼────────────┼─────────────┼──────────────┤
│ idle │ Client │ ClientRead │ (null) │ (null) │
└───────┴─────────────────┴────────────┴─────────────┴──────────────┘
(1 row)
I *can* reproduce with 9.6. It's worthwhile to note that the
transaction *is* actually aborted, via the path above. As far as I can
tell the problem is "solely" that we never reach a set_ps_display() that
resets the 'idle in transaction' bit, because START_REPLICATION "takes
over" the connection once started, and we don't go through
PostgresMain() again.
The problem doesn't exist in later versions, because there
exec_replication_command() does an explicit
/* Report to pgstat that this process is now idle */
pgstat_report_activity(STATE_IDLE, NULL);
That's due to
commit d02974e32e028fc078d8f5eca1d6a4516efb0aa6
Author: Magnus Hagander <magnus(at)hagander(dot)net>
Date: 2017-12-29 16:19:51 +0100
Properly set base backup backends to active in pg_stat_activity
which is post 9.6. Not sure I really feel a huge appetite for whacking
this around in the back branches.
Greetings,
Andres Freund
From | Date | Subject | |
---|---|---|---|
Next Message | Andres Freund | 2019-07-09 20:59:25 | Re: BUG #15899: Valgrind detects errors on create gist index |
Previous Message | Tom Lane | 2019-07-09 17:07:18 | Re: ALTER TABLE results in "ERROR: could not open relation with OID 43707388" |