From: | Patrick Molgaard <draaglom(at)gmail(dot)com> |
---|---|
To: | pgsql-performance(at)lists(dot)postgresql(dot)org |
Subject: | Understanding logical replication lag |
Date: | 2021-01-21 18:21:20 |
Message-ID: | CA+Zan4uOGR2PmCYowN6hF-qi=x3_KTkr3SgJ=D23STN4epGWSg@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
Hi all,
*My top-level query is*: I'm using logical replication under pg 9.6 to do a
kind of change data capture and I'm seeing occasional extended periods of
significant lag. I'm not sure what conceptual model I'm missing in order to
understand why this happens.
*The details:*
I'm running Postgres 9.6.19 from the postgres debian apt repos & the
wal2json extension.
I have a custom client application which essentially executes
pg_logical_slot_get_changes() for some manually-created logical replication
slot on a loop.
I'm monitoring replication lag defined as pg_current_xlog_location() -
confirmed_flush_lsn for that slot.
What I'm observing is - very occasionally - an extended period (hours long)
wherein:
* The normal database write load continues or slightly increases
* calls to pg_logical_slot_get_changes() return no rows and
confirmed_flush_lsn doesn't move
* the duration of a call to pg_logical_slot_get_changes() rises linearly
over time
I understand from the docs and research that this is usually caused by a
long-running write transaction, but I notice I'm still confused.
* I'm not 100% sure - I'm still confirming - but I'm fairly confident that
I don't have any egregiously long write transactions (at least on that
scale of hours). Are there any other common scenarios that can result in a
similar 'blockage'? e.g some categories of long read-only transactions, or
advisory locks, or other kinds of database activity like a vacuum?
* Conversely, from experimenting, it seems as if not all long-running write
transactions cause pg_logical_slot_get_changes() to be unable to advance.
In fact, I'm not able so far to produce a minimal set of simple queries
which show that behaviour.
Given the following sequence of queries I see changes emitted:
-- session 1
begin;
insert into foo(bar,baz) values (1, 1);
-- session 2
begin;
insert into foo(bar,baz) values (2,2);
commit;
-- session 3
select data from
pg_logical_slot_get_changes('example-slot', NULL, NULL, 'format-version',
'2');
Session 3 is able to return the row from session 2 despite session 1's
ongoing transaction starting first and not yet committing. Can you help me
understand (or better yet point me to a resource which explains) the
underlying logic defining how logical decoding does in fact get blocked by
in-flight transactions?
Thanks,
Patrick
From | Date | Subject | |
---|---|---|---|
Next Message | Nagaraj Raj | 2021-01-22 01:53:26 | Query performance issue |
Previous Message | Pavel Stehule | 2021-01-18 18:17:06 | Re: proposal: schema variables |