BUG #17936: Memory Leak when OPERATOR FAMILY use LANGUAGE SQL function

From: PG Bug reporting form <noreply(at)postgresql(dot)org>
To: pgsql-bugs(at)lists(dot)postgresql(dot)org
Cc: hu_yajun(at)qq(dot)com
Subject: BUG #17936: Memory Leak when OPERATOR FAMILY use LANGUAGE SQL function
Date: 2023-05-16 13:16:43
Message-ID: 17936-ccbf22f31c04b6cf@postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

The following bug has been logged on the website:

Bug reference: 17936
Logged by: yajun Hu
Email address: hu_yajun(at)qq(dot)com
PostgreSQL version: 15.3
Operating system: CentOS7 with kernel version 5.10
Description:

hi, community partners.
I guess I found a memory leak problem when OPERATOR FAMILY use LANGUAGE
SQL function.
** Problem **
I build postgresql with tag REL_15_3, and run following SQL.
--
CREATE OR REPLACE FUNCTION pg_catalog.text_cmp_bpchar(text, bpchar)
RETURNS int4
AS'select pg_catalog.bttextcmp($1, $2::text)'
LANGUAGE SQL IMMUTABLE STRICT PARALLEL SAFE;

ALTER OPERATOR FAMILY text_ops USING btree ADD
OPERATOR 3 =(bpchar, bpchar),
FUNCTION 1 pg_catalog.text_cmp_bpchar(text, bpchar);

create table t1( a varchar,b varchar);
create table t2(a bpchar,b bpchar);
insert into t1 select 1,1;
insert into t2 select 1,1 from generate_series(1,1000000)i;
create index on t1(a);
set enable_hashjoin to off;
set enable_mergejoin to off;
set enable_seqscan to off;
set enable_material to off;
set enable_memoize to off;

explain select * from t1,t2 where t1.a = t2.a::bpchar; -- show sql plan
explain analyze select * from t1,t2 where t1.a = t2.a::bpchar; -- run sql
--
The plan is shown in the bellow, the backend memory can be found to be
soaring when explain analyze this SQL.
postgres=# explain select * from t1,t2 where t1.a = t2.a::bpchar;
QUERY PLAN
----------------------------------------------------------------------------------
Nested Loop (cost=10000000000.12..10000067714.50 rows=1947 width=128)
-> Seq Scan on t2 (cost=10000000000.00..10000008319.00 rows=389400
width=64)
-> Index Scan using t1_a_idx on t1 (cost=0.12..0.14 rows=1 width=64)
Index Cond: ((a)::bpchar = t2.a)
(4 rows)
I Use the 'top -c' to find that the memory usage of this session exceeds 15G
within 20s

** My simple analysis **
1. Each Index Scan on t1 will call _bt_first->ScanKeyEntryInitialize to
Initialize btree scanKey
2. Then in _bt_first, will call _bt_search/_bt_binsrch -> _bt_compare ->
FunctionCall2Coll to execute function to compare btree datum
3. Because we are using SQL functions here, FunctionCall2Coll will call
fmgr_sql and generate SQLFunctionCache into fcinfo->flinfo->fn_extra;
4. In _bt_first, we save fmgrinfo into inskey.scankeys[{keyid}]->sk_func,
that is, SQLFunctionCache is savad in
inskey.scankeys[{keyid}]->sk_func->fn_extra
5. In _bt_first, inskey is on the stack, so we lose the SQLFunctionCache
pointing after finish _bt_first
6. The MemoryContext of SQLFunctionCache has a long life cycle, so every
indexScan execution will leak memory of SQLFunctionCache once, and all the
memory will be released after the SQL run ends.

Is this a bug? Or is it a design problem?
I also reproduced this problem on the community master branch with commit
8cb94344c3c7130a0cd5e21e83705739f552187e
Maybe other codes have similar problems when calling fmgr_sql.

Regards, Yajun Hu

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Daniel Gustafsson 2023-05-16 13:31:42 Re: Memory leak on subquery as scalar operand
Previous Message PG Bug reporting form 2023-05-16 13:00:00 BUG #17935: Incorrect memory access in fuzzystrmatch/difference()