Hi,
Recently, I noticed that while I used pgjdbc to batch update a partition table, Cacheplan can
sometimes use a lot of memory, this problem occurs in more Postgresql backend processes,
which amplifies the impact. I tried to repeat this question in my test environment.
The testing query (more than 1000 condition after in clause):
UPDATE test SET migrate_account_batch_id = 'newtest2' WHERE (id, store_id) IN (xxx);
I use backend function MemoryContextStats to print CachedPlan for this query to error log file,
the output content is:
CachedPlan: 33580752 total in 18 blocks; 7615592 free (1 chunks); 25965160 used.
This seems to use more memory than expected. Then I found out that the reason is that
filters like ((id, store_id) IN (id1, id2),(id3, id4)) are represented in memory as
[(id = id1 and store_id = id2) or (id = id3 and store_id = id4) or ......], and planner
generate BitmapIndexScan path for every filtering condition, so that when there are many
filtering conditions after in clause, the used memory will be more. Then, if the
update occurs in multiple child table of the partitioned table, this memory can
also be multiplied.
I would like to ask why aren't CachedPlans immediately released after execution
when using extend protocol, so as to prevent multiple such CachedPlan from
causing high memory usage. The logic now is that if an sql is executed using the
extend protocol, its CachedPlan will be resident in memory even if it is not
being used, instead, it is released when the bind message is processed again in
function PortalDrop.
Or maybe there's some more detailed code I didn't notice.
Regards,
Tinghai Zhao