Re: array size exceeds the maximum allowed (1073741823) when building a json

From: Merlin Moncure <mmoncure(at)gmail(dot)com>
To: Nicolas Paris <niparisco(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>, pgsql-performance <pgsql-performance(at)postgresql(dot)org>
Subject: Re: array size exceeds the maximum allowed (1073741823) when building a json
Date: 2016-06-09 20:36:21
Message-ID: CAHyXU0wiGM8VzuRdAwipJ3g1sr6bKt2wtkm_56xF0GU74gf7xw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Thu, Jun 9, 2016 at 8:43 AM, Nicolas Paris <niparisco(at)gmail(dot)com> wrote:
>
>
> 2016-06-09 15:31 GMT+02:00 Merlin Moncure <mmoncure(at)gmail(dot)com>:
>>
>> On Wed, Jun 8, 2016 at 1:04 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> > Michael Paquier <michael(dot)paquier(at)gmail(dot)com> writes:
>> >> On Tue, Jun 7, 2016 at 10:03 PM, Josh Berkus <josh(at)agliodbs(dot)com> wrote:
>> >>> On 06/07/2016 08:42 AM, Nicolas Paris wrote:
>> >>>> Will this 1GO restriction is supposed to increase in a near future ?
>> >
>> >>> Not planned, no. Thing is, that's the limit for a field in general,
>> >>> not
>> >>> just JSON; changing it would be a fairly large patch. It's
>> >>> desireable,
>> >>> but AFAIK nobody is working on it.
>> >
>> >> And there are other things to consider on top of that, like the
>> >> maximum allocation size for palloc, the maximum query string size,
>> >> COPY, etc. This is no small project, and the potential side-effects
>> >> should not be underestimated.
>> >
>> > It's also fair to doubt that client-side code would "just work" with
>> > no functionality or performance problems for such large values.
>> >
>> > I await with interest the OP's results on other JSON processors that
>> > have no issues with GB-sized JSON strings.
>>
>> Yup. Most json libraries and tools are going to be disgusting memory
>> hogs or have exponential behaviors especially when you consider you
>> are doing the transformation as well. Just prettifying json documents
>> over 1GB can be a real challenge.
>>
>> Fortunately the workaround here is pretty easy. Keep your query
>> exactly as is but remove the final aggregation step so that it returns
>> a set. Next, make a small application that runs this query and does
>> the array bits around each row (basically prepending the final result
>> with [ appending the final result with ] and putting , between rows).
>
>
> The point is when prepending/appending leads to deal with strings.
> Transforming each value of the resultset to a string implies to escape the
> double quote.
> then:
> row1 contains {"hello":"world"}
> step 1 = prepend -> "[{\"hello\":\"world\"}"
> step 2 = append -> "[{\"hello\":\"world\"},"

right 3 rows contain {"hello":"world"}

before iteration: emit '['
before every row except the first, prepend ','
after iteration: emit ']'

you end up with:
[{"hello":"world"}
,{"hello":"world"}
,{"hello":"world"}]

...which is 100% valid json as long as each row of the set is a json object.

in SQL, the technique is like this:
select ('[' || string_agg(j::text, ',') || ']')::json from (select
json_build_object('hello', 'world') j from generate_series(1,3)) q;

the difference is, instead of having the database do the string_agg
step, it's handled on the client during iteration over the output of
generate_series.

merlin

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Andres Freund 2016-06-09 22:28:56 Re: 9.4 -> 9.5 regression with queries through pgbouncer on RHEL 6
Previous Message Jeff Janes 2016-06-09 16:37:50 Re: Performance of LIKE/NOT LIKE when used in single query