Re: BUG #15321: XMLTABLE leaks memory like crazy

From: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: pgsql-bugs(at)lists(dot)postgresql(dot)org, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Subject: Re: BUG #15321: XMLTABLE leaks memory like crazy
Date: 2018-08-12 09:44:18
Message-ID: 87ftzk6jw2.fsf@news-spur.riddles.org.uk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

>>>>> "Pavel" == Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> writes:

>> We still need a per-result-tuple memory context otherwise we're
>> leaking whatever memory got allocated in each GetValue call into the
>> per-input-value context. (We can use our projection econtext's
>> per-tuple memory for this because we haven't done any evaluation of
>> output items for the current cycle yet at the point this code is
>> reached.)
>>
>> Again, look at functionscan for how this is supposed to work.

Pavel> it is done by tuplestore_putvalues

No, tuplestore_putvalues is only responsible for the memory it allocates
itself, which belongs to the tuplestore, it has nothing to do with the
memory allocated by its caller - and XmlTableGetValue does quite a few
allocations.

It's true that the only way (I think) to get a lot of result rows from
XMLTABLE is to use a large input value, the memory usage for which will
dwarf that of the output rows (a ~128MB xml value can easily mean using
~4.5GB of backend memory, which seems quite excessively profligate), but
my tests show that using that value, the lack of a per-GetValue context
reset costs perhaps another ~500MB on top for my specific testcase:

select count(*)
from (select ('<rec xmlns="http://x">'
|| repeat('<o><c>foo</c></o>',8000000+(i%10))
|| '</rec>')::xml as content
from generate_series(1,10) i offset 0) s,
xmltable(xmlnamespaces('http://x' as x),
'x:o'
passing s.content
columns
col1 text path 'x:c');

>> It's not useless at all; it's needed to avoid excess memory usage
>> when a single XMLTABLE() call returns many rows.

Pavel> When this context was not necessary before, then it is not need
Pavel> be used now. Tuplestore does all work

Before, you were using perValueCxt and resetting it once per GetValue
call. Since my patch takes perValueCxt to use for another purpose
instead, it needs to be replaced, and econtext->ecxt_per_tuple_memory
is suitable for the job (and consistent with functionscan).

The tuplestore does not do all the work - just look at XmlTableGetValue,
and notice that it has calls to text_to_cstring, pstrdup,
appendStringInfoText, InputFunctionCall, xml_xmlnodetoxmltype, all of
which will allocate memory in the current memory context. All of that
needs to be reset on a per-output-tuple basis.

--
Andrew (irc:RhodiumToad)

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Pavel Stehule 2018-08-12 10:02:31 Re: BUG #15321: XMLTABLE leaks memory like crazy
Previous Message Pavel Stehule 2018-08-12 08:59:10 Re: BUG #15321: XMLTABLE leaks memory like crazy