Re: BUG #15321: XMLTABLE leaks memory like crazy

From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
Cc: pgsql-bugs(at)lists(dot)postgresql(dot)org, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Subject: Re: BUG #15321: XMLTABLE leaks memory like crazy
Date: 2018-08-12 10:02:31
Message-ID: CAFj8pRDWORKuuLEvFN-tMp06nuD5g_+yPAWWVW7umanoriEwwg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

2018-08-12 11:44 GMT+02:00 Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>:

> >>>>> "Pavel" == Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> writes:
>
> >> We still need a per-result-tuple memory context otherwise we're
> >> leaking whatever memory got allocated in each GetValue call into the
> >> per-input-value context. (We can use our projection econtext's
> >> per-tuple memory for this because we haven't done any evaluation of
> >> output items for the current cycle yet at the point this code is
> >> reached.)
> >>
> >> Again, look at functionscan for how this is supposed to work.
>
> Pavel> it is done by tuplestore_putvalues
>
> No, tuplestore_putvalues is only responsible for the memory it allocates
> itself, which belongs to the tuplestore, it has nothing to do with the
> memory allocated by its caller - and XmlTableGetValue does quite a few
> allocations.
>

Maybe I used wrong words. Sorry, my English lang is not good.

In master tfuncLoadRows switch to tstate->perValueCxt. You change it by
switch econtext->ecxt_per_tuple_memory what is wrong I thing.

If we don't change memory context, we will stay inside perValueCxt. And
this context will be cleaned outside.

> It's true that the only way (I think) to get a lot of result rows from
> XMLTABLE is to use a large input value, the memory usage for which will
> dwarf that of the output rows (a ~128MB xml value can easily mean using
> ~4.5GB of backend memory, which seems quite excessively profligate), but
> my tests show that using that value, the lack of a per-GetValue context
> reset costs perhaps another ~500MB on top for my specific testcase:
>
> select count(*)
> from (select ('<rec xmlns="http://x">'
> || repeat('<o><c>foo</c></o>',8000000+(i%10))
> || '</rec>')::xml as content
> from generate_series(1,10) i offset 0) s,
> xmltable(xmlnamespaces('http://x' as x),
> 'x:o'
> passing s.content
> columns
> col1 text path 'x:c');
>
> >> It's not useless at all; it's needed to avoid excess memory usage
> >> when a single XMLTABLE() call returns many rows.
>
> Pavel> When this context was not necessary before, then it is not need
> Pavel> be used now. Tuplestore does all work
>
> Before, you were using perValueCxt and resetting it once per GetValue
> call. Since my patch takes perValueCxt to use for another purpose
> instead, it needs to be replaced, and econtext->ecxt_per_tuple_memory
> is suitable for the job (and consistent with functionscan).
>
>
I think so using cxt_per_tuple_memory is not necessary - and my patch is
working too.

Just I removed MemoryContextReset(tstate->perValueCxt) after
tuplestore_putvalues. It is possible, because tstate->perValueCxt is
cleaned immediately in caller tfuncFetchRows function.

> The tuplestore does not do all the work - just look at XmlTableGetValue,
> and notice that it has calls to text_to_cstring, pstrdup,
> appendStringInfoText, InputFunctionCall, xml_xmlnodetoxmltype, all of
> which will allocate memory in the current memory context. All of that
> needs to be reset on a per-output-tuple basis.
>

But it has own context or it can works under perValueCxt

I think using two memory contexts is not necessary, and with just one
context, the code can be simpler.

Regards

Pavel

> --
> Andrew (irc:RhodiumToad)
>

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Andrew Gierth 2018-08-12 10:25:39 Re: BUG #15321: XMLTABLE leaks memory like crazy
Previous Message Andrew Gierth 2018-08-12 09:44:18 Re: BUG #15321: XMLTABLE leaks memory like crazy