Re: Querying one partition in a function takes locks on all partitions

From: Evgeny Morozov <postgresql4(at)realityexists(dot)net>
To: David Rowley <dgrowleyml(at)gmail(dot)com>
Cc: PostgreSQL General <pgsql-general(at)postgresql(dot)org>
Subject: Re: Querying one partition in a function takes locks on all partitions
Date: 2025-03-28 17:00:20
Message-ID: 01020195ddb28a27-4a576e04-8cd7-4a0b-abc7-acb901700ee7-000000@eu-west-1.amazonses.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On 23/03/2025 2:35 pm, David Rowley wrote:
>> alter table entity_2 add column new_column text;
> Is this just an example command? You can't add a column to a
> partition directly.

Yes, it was just the simplest way I could think of to take an exclusive
lock. But on this note: I guess it takes a lock before checking that the
table is partition because it might otherwise get attached/detached as a
partition between doing the check and taking a lock, but it would be
nice if it took a shared lock to do all checks, then upgraded to an
exclusive lock to actually do the work. Then the user would not have to
wait for an exclusive lock only for the command to fail.

> What's going on is that PostgreSQL is creating a generic plan for your
> query, that is, a plan that will work with any parameter value that
> you give to your function. When the generic plan is executed and the
> locks are taken for the relations mentioned in the query, and since
> the plan is generic, it includes all partitions that could match any
> possible parameter value you could pass. When the locks are taken,
> it's not yet known which partition will be needed as the partition
> pruning that occurs only does so after the locks are taken.

I see, thank you for the explanation. This seems like a bad plan,
though, because even at query preparation time it can be determined that
only one partition will need to be scanned, since the query filters on
the partition key, so it may need to read *any one* partition, but never
all partitions. So in this case, isn't it better to avoid caching a
generic plan at all? Even if the locking issue is fixed in PG 18, isn't
such a plan likely to be sub-optimal in other ways (for a more complex
query)? I don't know anything about the internals of the query planner,
but I have run into other performance issues with SQL functions querying
partitioned tables. One function we have *sometimes* uses very slow
sequential scans instead of index scans, and I've had to work around
that with "set enable_seqscan = off".

In this case, even "set plan_cache_mode = force_custom_plan" doesn't
help somehow. Isn't that supposed to... make PG use a custom a plan?

Here is another weird thing I forgot to menton in the original post: for
a set-returning function whether one partition table is locked or all of
them seems to depend on whether you "select func" or "select from func"!

create function read_partition_rows(which_part int) returns
table(part_id int) as
'select part_id from entity where part_id = which_part;'
language sql stable;

select * from read_partition_rows(1); -- This takes a lock only on entity_1
select read_partition_rows(1); -- but this takes locks on entity_1 and
entity_2!

> There has been work done for PostgreSQL 18 which swaps the order of
> these operations and makes it so that the executor only obtains the
> locks on the partitions which will actually be scanned. Hopefully,
> we'll see that feature released with PostgreSQL 18 toward the end of
> 2025.

Happy to hear that! I hope that makes it in, but in the meanwhile, yes,
it would be nice to at least document this gotcha and any workarounds
for it. The only one I've found is to use a pl/PgSQL function.

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message David G. Johnston 2025-03-28 17:02:29 Re: BTREE index: field ordering
Previous Message Ron Johnson 2025-03-28 16:24:00 Re: BTREE index: field ordering