Re: json(b)_array_elements use causes very large memory usage when also referencing entire json document

From: Lucas Fairchild-Madar <lucas(dot)madar(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: json(b)_array_elements use causes very large memory usage when also referencing entire json document
Date: 2017-10-06 20:03:52
Message-ID: CAJmoq7PdrJFteGWtQFyb6fBzbZft2mkG50=RjZNOtGyNRW+=5g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Fri, Oct 6, 2017 at 12:37 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> Lucas Fairchild-Madar <lucas(dot)madar(at)gmail(dot)com> writes:
> > Sure, here's a bash script. Assumes you have PGDATABASE, etc set.
> > Verified this also blows up on pg 10. I haven't verified the new patch.
>
> Oh, I see the problem. This is a different animal, because it's actually
> an intra-row leak, as it were. What you've got is
>
> select data->'id', jsonb_array_elements(data->'items') from kaboom;
>
> where the SRF jsonb_array_elements() emits a lot of values. For each
> of those values, data->'id' gets evaluated over again, and we can't
> reclaim memory in the per-tuple context until we've finished the whole
> cycle for the current row of "kaboom". So a leak would occur in any
> case ... but it's particularly awful in this case, because data->'id'
> involves detoasting the rather wide value of "data", which is then
> promptly leaked. So the total memory consumption is more or less
> proportional to O(N^2) in the length of "data".
>
> This has been like this since forever, and it's probably impractical
> to do anything about it pre-v10, given the unstructured way that
> targetlist SRFs are handled. You could dodge the problem by moving
> the SRF to a lateral FROM item:
>
> select data->'id', ja
> from kaboom, lateral jsonb_array_elements(data->'items') as ja;
>
> (The LATERAL keyword is optional here, but I like it because it
> makes it clearer what's happening.)
>
> As of v10, it might be possible to fix this for the tlist case
> as well, by doing something like using a separate short-lived
> context for the non-SRF tlist items.
>

Is there any sort of setting right now that can defend against this? A way
to prevent a query from using 20+GB of memory? I'd prefer the query fail
before the database system is kill -9'd.

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Tom Lane 2017-10-06 20:13:22 Re: json(b)_array_elements use causes very large memory usage when also referencing entire json document
Previous Message Tom Lane 2017-10-06 19:37:03 Re: json(b)_array_elements use causes very large memory usage when also referencing entire json document