System column support for partitioned tables using heap

From: Morris de Oryx <morrisdeoryx(at)gmail(dot)com>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: System column support for partitioned tables using heap
Date: 2022-07-18 01:04:21
Message-ID: CAKqnccj7DaC0HhTLSB1FkLoqnGXv15GeB-Q1=x1V=RMVcha3=Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I've run into an existing behavior where xmax(), and various other system
tables, return an error when included in the RETURNING list on a
partitioned table.

ERROR: cannot retrieve a system column in this context
`
This issue got a fair airing back in 2020:

AW: posgres 12 bug (partitioned table)
https://www.postgresql.org/message-id/flat/GVAP278MB006939B1D7DFDD650E383FBFEACE0%40GVAP278MB0069.CHEP278.PROD.OUTLOOK.COM#908f2604081699e7f41fa20d352e1b79

I'm using 14.4, and just ran into this behavior today. I'm wondering if
there has been any new work on this subject, or anything to take into
account moving forward?

I'm not a C coder, and do not know the Postgres internals, but here's what
I gleaned from the thread:

* Available system columns depend on the underlying table access method,
and may/will vary across AMs. For example, the columns implemented by heap
is what the docs describe, an FDW could be anything, and Postgres has no
control of what, if any, system column-like attributes they support, and
future and hypothetical AMs may have different sets.

* Rather than return garbage results, or a default of 0, etc., the system
throws the error I ran into.

I'd be happier working with a NULL result than garbage, ambiguous results,
or errors...but an error is the current behavior. Agreed on that, I'd
rather an error than a bad/meaningless result. Postgres' consistent
emphasis on correctness is easily one of its greatest qualities.

In my case, I'm upgrading a lot of existing code to try and capture a more
complete profile of what an UPSERT did. Right now, I grab a count(*) of the
rows and return that. Works fine. A revised snippet looks a bit like this:

------------------------------------------------------------
...UPSERT code
returning xmax as inserted_transaction_id),

status_data AS (
select count(*) FILTER (where inserted_transaction_id = 0) AS
insert_count,
count(*) FILTER (where inserted_transaction_id != 0) AS
estimated_update_count,
pg_current_xact_id_if_assigned()::text AS
transaction_id

from inserted_rows),

...custom logging code

-- Final output/result.
select insert_count,
estimated_update_count,
transaction_id

from status_data;
------------------------------------------------------------

This fails on a partitioned table because xmax() may not exist. In fact, it
does exist in all of those tables, but the system doesn't know how to
guarantee that. I know which tables are partitioned, and can downgrade the
result on partitioned tables to the count(*) I've been using to date. But
now I'm wondering if working with xmax() like this is a poor idea going
forward. I don't want to lean on a feature/behavior that's likely to
change. For example, I noticed the other day that MERGE does not support
RETURNING.

I'd appreciate any insight or advice you can offer.

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2022-07-18 01:05:56 pgsql: Default to hidden visibility for extension libraries where possi
Previous Message Tom Lane 2022-07-17 23:20:33 Re: Proposal to introduce a shuffle function to intarray extension