Re: High memory usage in CachedPlan for large IN clauses in partitioned table updates

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: 赵庭海(庭章) <zhaotinghai(dot)zth(at)alibaba-inc(dot)com>
Cc: "pgsql-hackers" <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: High memory usage in CachedPlan for large IN clauses in partitioned table updates
Date: 2025-03-24 15:44:13
Message-ID: 1002783.1742831053@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

"=?UTF-8?B?6LW15bqt5rW3KOW6reeroCk=?=" <zhaotinghai(dot)zth(at)alibaba-inc(dot)com> writes:
> The testing query (more than 1000 condition after in clause):
> UPDATE test SET migrate_account_batch_id = 'newtest2' WHERE (id, store_id) IN (xxx);

As you've discovered, this is a lousy way to write such a query.
You'd get a better result by putting all the constants in a VALUES
clause that you join to, along the lines of

WITH v(v1, v2) as (VALUES (1,2),(3,4),...)
UPDATE test SET migrate_account_batch_id = 'newtest2'
FROM v
WHERE (id, store_id) = (v1, v2);

If the value pairs aren't known unique, instead

FROM (SELECT DISTINCT * FROM v) vv

> I would like to ask why aren't CachedPlans immediately released after execution
> when using extend protocol, so as to prevent multiple such CachedPlan from
> causing high memory usage.

If you use a named statement, the expectation is that the plan could
be re-used. You could use an unnamed statement if you don't want
that. But really the size of the plan is the least of your problems
with this query --- planning time has to be pretty awful as well.

regards, tom lane

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Nikolay Shaplov 2025-03-24 15:45:00 Re: vacuum_truncate configuration parameter and isset_offset
Previous Message Andres Freund 2025-03-24 15:43:47 Re: AIO v2.5