From: | Noah Misch <noah(at)leadboat(dot)com> |
---|---|
To: | Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> |
Cc: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Euler Taveira <euler(at)eulerto(dot)com>, "osumi(dot)takamichi(at)fujitsu(dot)com" <osumi(dot)takamichi(at)fujitsu(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>, "tanghy(dot)fnst(at)fujitsu(dot)com" <tanghy(dot)fnst(at)fujitsu(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Greg Nancarrow <gregn4422(at)gmail(dot)com>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Alexey Lesovsky <lesovsky(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Skipping logical replication transactions on subscriber side |
Date: | 2022-04-02 08:13:46 |
Message-ID: | 20220402081346.GD3719101@rfd.leadboat.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Sat, Apr 02, 2022 at 04:33:44PM +0900, Masahiko Sawada wrote:
> It seems that 0/B0706F72 is not a random value. Two subscriber logs
> show the same value. Since 0x70 = 'p', 0x6F = 'o', and 0x72 = 'r', it
> might show the next field in the pg_subscription catalog, i.e.,
> subconninfo. The subscription is created by "CREATE SUBSCRIPTION sub
> CONNECTION 'port=57851 host=/tmp/6u2vRwQYik dbname=postgres'
> PUBLICATION pub WITH (disable_on_error = true, streaming = on,
> two_phase = on)".
>
> Given subscription.sql passes, something is wrong when we read the
> subskiplsn value by like "sub->skiplsn = subform->subskiplsn;".
That's a good clue. We've never made pg_type.typalign able to represent
alignment as it works on AIX. A uint64 like pg_lsn has 8-byte alignment, so
the C struct follows from that. At the typalign level, we have only these:
#define TYPALIGN_CHAR 'c' /* char alignment (i.e. unaligned) */
#define TYPALIGN_SHORT 's' /* short alignment (typically 2 bytes) */
#define TYPALIGN_INT 'i' /* int alignment (typically 4 bytes) */
#define TYPALIGN_DOUBLE 'd' /* double alignment (often 8 bytes) */
On AIX, they are:
#define ALIGNOF_DOUBLE 4
#define ALIGNOF_INT 4
#define ALIGNOF_LONG 8
/* #undef ALIGNOF_LONG_LONG_INT */
/* #undef ALIGNOF_PG_INT128_TYPE */
#define ALIGNOF_SHORT 2
uint64 and pg_lsn use TYPALIGN_DOUBLE. For AIX, they really need a typalign
corresponding to ALIGNOF_LONG. Hence, the C struct layout doesn't match the
tuple layout. Columns potentially affected:
[local] test=*# select attrelid::regclass, attname from pg_attribute a join pg_class c on c.oid = attrelid where attalign = 'd' and relkind = 'r' and attnotnull and attlen <> -1;
attrelid │ attname
─────────────────┼──────────────
pg_sequence │ seqstart
pg_sequence │ seqincrement
pg_sequence │ seqmax
pg_sequence │ seqmin
pg_sequence │ seqcache
pg_subscription │ subskiplsn
(6 rows)
The pg_sequence fields evade trouble, because there's exactly eight bytes (two
oids) before them.
Some options:
- Move subskiplsn after subdbid, so it's always aligned anyway. I've
confirmed that this lets the test pass, in 44s.
- Move subskiplsn to the CATALOG_VARLEN section, despite its fixed length.
- Introduce a new typalign value suitable for uint64. This is more intrusive,
but it's more future-proof. Looking beyond catalog columns, it might
improve performance by avoiding unaligned reads.
> Is it possible to run the test again with the attached patch?
Logs attached. The test "passed", though it printed "poll_query_until timed
out" three times and took awhile.
Attachment | Content-Type | Size |
---|---|---|
log-subscription-20220401c.tar.xz | application/octet-stream | 25.6 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Andres Freund | 2022-04-02 08:16:48 | Re: shared-memory based stats collector - v66 |
Previous Message | Masahiko Sawada | 2022-04-02 07:33:44 | Re: Skipping logical replication transactions on subscriber side |