logical decoding issue with concurrent ALTER TYPE

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Cc: callaan(at)amazon(dot)com
Subject: logical decoding issue with concurrent ALTER TYPE
Date: 2023-08-06 05:35:06
Message-ID: CAD21AoAenVqiMjpN-PvGHL1N9DWnHSq673bfgr6phmBUzx=kLQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi all,

A colleague Drew Callahan (in CC) has discovered that the logical
decoding doesn't handle syscache invalidation messages properly that
are generated by other transactions. Here is example (I've attached a
patch for isolation test),

-- Setup
CREATE TYPE comp AS (f1 int, f2 text);
CREATE TABLE tbl3(val1 int, val2 comp);
SELECT pg_create_logical_replication_slot('s', 'test_decoding');

-- Session 1
BEGIN;
INSERT INTO tbl3 (val1, val2) VALUES (1, (1, '2'));

-- Session 2
ALTER TYPE comp ADD ATTRIBUTE f3 int;

INSERT INTO tbl3 (val1, val2) VALUES (1, (1, '2', 3));
COMMIT;

pg_logical_slot_get_changes() returns:

BEGIN
table public.tbl3: INSERT: val1[integer]:1 val2[comp]:'(1,2)'
table public.tbl3: INSERT: val1[integer]:1 val2[comp]:'(1,2)'
COMMIT

However, the logical decoding should reflect the result of ALTER TYPE
and the val2 of the second INSERT output should be '(1,2,3)'.

The root cause of this behavior is that while ALTER TYPE can be run
concurrently to INSERT, the logical decoding doesn't handle cache
invalidation properly, and it got a cache hit of stale data (of
typecache in this case). Unlike snapshots that are stored in the
transaction’s reorder buffer changes, the invalidation messages of
other transactions are not distributed. As a result, the snapshot
becomes moot when we get a cache hit of stale data due to not
processing the invalidation messages again. This is not an issue for
ALTER TABLE and the like due to 2 phase locking and taking an
AccessExclusive lock. The issue with DMLs and ALTER TYPE has been
discovered but there might be other similar cases.

As far as I tested, this issue has existed since v9.4, where the
logical decoding was introduced, so it seems to be a long-standing
issue.

The simplest fix would be to invalidate all caches when setting a new
historical snapshot (i.e. applying CHANGE_INTERNAL_SNAPSHOT) but we
end up invalidating other caches unnecessarily too.

A better fix would be that when decoding the commit of a catalog
changing transaction, we distribute invalidation messages to other
concurrent transactions, like we do for snapshots. But we might not
need to distribute all types of invalidation messages, though.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Attachment Content-Type Size
logical-decoding-ALTER-TYPE.patch application/octet-stream 3.7 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message jacktby jacktby 2023-08-06 05:37:42 Re: How to add a new operator for parser?
Previous Message Julien Rouhaud 2023-08-06 05:18:53 Re: How to add a new operator for parser?