Switch PgStat_HashKey.objoid from Oid to uint64

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Postgres hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Switch PgStat_HashKey.objoid from Oid to uint64
Date: 2024-08-26 00:58:51
Message-ID: ZsvTS9EW79Up8I62@paquier.xyz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi all,

While working more on the cumulative pgstats and its interactions with
pg_stat_statements, one thing that I have been annoyed with is that
the dshash key for variable-numbered stats uses a pair of (Oid dboid,
Oid objoid), mostly to stick with the fact that most of the stats are
dealing with system objects.

That's not completely true, though, as statistics can also implement
their own index numbering without storing these numbers to disk, by
defining {from,to}_serialized_name. Replication slots do that, so we
are already considering as OIDs numbers that are not that.

For pg_stat_statements, one issue with the current pgstats is that we
want to use the query ID as hash key, which is 8 bytes, while also
having some knowledge of the database OID because we want to be able
to clean up stats entries about specific databases.

Please find attached a patch switching PgStat_HashKey.objoid from an
Oid to uint64 to be able to handle cases of stats that want more
space. The size of PgStat_HashKey is increased from 12 to 16 bytes,
but with alignment the size of PgStatShared_HashEntry (what's stored
in the dshash) is unchanged at 32 bytes.

Perhaps what's proposed here is a bad idea for a good reason, and we
could just leave with storing 4 bytes of the query ID in the dshash
instead of 8. Anyway, we make a lot of efforts to use 8 bytes to
reduce conflicts with different statements.

Another thing to note is the change for xl_xact_stats_item, requiring
a bump of XLOG_PAGE_MAGIC. A second thing is pg_stat_have_stats that
needs to use a different argument than an OID for the object,
requiring a catversion bump.

An interesting thing is that I have seen ubsan complain about this
patch, due to the way WAL records xl_xact_commit are built with
XACT_XINFO_HAS_DROPPED_STATS and parsed as xl_xact_stats_item requires
an 8-byte alignment now (see pg_waldump TAP reports when using the
attached), but we don't enforce anything as the data of such WAL
records is added with a simple XLogRegisterData(), like:
# xactdesc.c:91:28: runtime error: member access within misaligned
address 0x5651e996b86c for type 'struct xl_xact_stats_items', which
requires 8 byte alignment # 0x5651e996b86c: note: pointer points here

TBH, I've looked at that for quite a bit, thinking about the addition
of some "dummy" member to some of the parsed structures to force some
padding, or play with the alignment macros, or for some alignment when
inserting the record, or looked at pg_attribute_aligned().

First I'm surprised that it did not show up as an issue yet in this
area. Second, I could not get down to something "nice", but perhaps
there are preferred approaches when it comes to that and somebody has
a fancier idea? Or perhaps the problem is bigger than that due to
the way the record is designed and built? It also feels that I'm
missing something obvious, not sure what TBH. Still I'm OK to paint
some more MAXALIGN()s to make sure that all these deparsing pointers
have a correct alignment with some more TYPEALIGN()s or similar,
because this deparsing stuff is about that, but I'm also wondering if
there is an argument for forcing that for the record itself? I'll
think more about that next week or so.

Anyway, I'm attaching that to the next CF for discussion for now, as
there could be objections about this whole idea, as well.

Thoughts or comments?
--
Michael

Attachment Content-Type Size
0001-Bump-PgStat_HashKey.objoid-to-be-8-bytes.patch text/x-diff 22.8 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Smith 2024-08-26 01:58:04 Re: Conflict Detection and Resolution
Previous Message jian he 2024-08-26 00:00:00 Re: Change COPY ... ON_ERROR ignore to ON_ERROR ignore_row