RE: Object identifier types in logical replication binary mode

From: "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>
To: "'emre(at)hasegeli(dot)com'" <emre(at)hasegeli(dot)com>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Shlok Kyal <shlok(dot)kyal(dot)oss(at)gmail(dot)com>
Subject: RE: Object identifier types in logical replication binary mode
Date: 2024-12-25 07:12:03
Message-ID: OSCPR01MB149667E80A900EE9EF4D7FE0BF50C2@OSCPR01MB14966.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Dear Emre,

> I modified your test script to demonstrate the problem.

I could reproduce the error with your script, thanks. I also could reproduce the
same error with the simplified version, see attached.

I feel this is a normal behavior which can happen without the logical replication.
I could reproduce only by a single instance [1].

> My analysis on the original post was wrong. The subscriber handles it
> as an oid.

IIUC, this is what happened here. Maybe you have the same picture...

1. Publisher exported a data with the binary format. The data (regclass) was
represented as the oid.
2. Subscriber received the data. Since the destination table had text datatype,
it tried to understand as the text format.
3. Unfortunately, the original data had a byte string like "0x00". This could not
be used for the normal text so that an error happened.

The documentation has already pointed out that binary format has lesser portability
than textual one [2], and you seem to encounter the issue.

> Though, it's still not clear to me this is a desirable
> behaviour for the users, because oid's of the objects differ.

But it is not also clear that it is OK to transform the data to the string - it
is quite depends on the use-case. Personally, OID is meaningful only on the
instance so such tables should not be replicated to others. Can you exclude such
tables or attributes by the CREATE PUBLICATION?

[1]:
```
postgres=# CREATE TABLE t1 (a regclass); -- creates t1 with regclass datatype
CREATE TABLE
postgres=# INSERT INTO t1 VALUES ('t1'); -- insert a tuple
INSERT 0 1
postgres=# COPY t1 TO '/path/to/binary.dat' WITH ( FORMAT binary ); -- copy the tuple to somewhere
COPY 1
postgres=# CREATE TABLE t2 (a text); -- creates t1 with TEXT datatype
CREATE TABLE
postgres=# COPY t2 FROM '/path/to/binary.dat' WITH ( FORMAT binary ); -- do copy from
ERROR: invalid byte sequence for encoding "UTF8": 0x00
CONTEXT: COPY t2, line 1, column a
```
[2]: https://www.postgresql.org/docs/devel/sql-createsubscription.html#SQL-CREATESUBSCRIPTION-PARAMS-WITH-BINARY

Best regards,
Hayato Kuroda
FUJITSU LIMITED

Attachment Content-Type Size
test_1225.sh application/octet-stream 601 bytes

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Richard Guo 2024-12-25 07:49:30 Re: ERROR: corrupt MVNDistinct entry
Previous Message Quan Zongliang 2024-12-25 06:32:51 Re: stored short varlena in array