Re: array size exceeds the maximum allowed (1073741823) when building a json

From: Nicolas Paris <niparisco(at)gmail(dot)com>
To: Merlin Moncure <mmoncure(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>, pgsql-performance <pgsql-performance(at)postgresql(dot)org>
Subject: Re: array size exceeds the maximum allowed (1073741823) when building a json
Date: 2016-06-09 13:43:07
Message-ID: CA+ssMOSuBmUFWvWn_=6vow63Wp7seEhUF_TDCO7Duip3j2GUUw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

2016-06-09 15:31 GMT+02:00 Merlin Moncure <mmoncure(at)gmail(dot)com>:

> On Wed, Jun 8, 2016 at 1:04 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> > Michael Paquier <michael(dot)paquier(at)gmail(dot)com> writes:
> >> On Tue, Jun 7, 2016 at 10:03 PM, Josh Berkus <josh(at)agliodbs(dot)com> wrote:
> >>> On 06/07/2016 08:42 AM, Nicolas Paris wrote:
> >>>> Will this 1GO restriction is supposed to increase in a near future ?
> >
> >>> Not planned, no. Thing is, that's the limit for a field in general,
> not
> >>> just JSON; changing it would be a fairly large patch. It's desireable,
> >>> but AFAIK nobody is working on it.
> >
> >> And there are other things to consider on top of that, like the
> >> maximum allocation size for palloc, the maximum query string size,
> >> COPY, etc. This is no small project, and the potential side-effects
> >> should not be underestimated.
> >
> > It's also fair to doubt that client-side code would "just work" with
> > no functionality or performance problems for such large values.
> >
> > I await with interest the OP's results on other JSON processors that
> > have no issues with GB-sized JSON strings.
>
> Yup. Most json libraries and tools are going to be disgusting memory
> hogs or have exponential behaviors especially when you consider you
> are doing the transformation as well. Just prettifying json documents
> over 1GB can be a real challenge.
>
> Fortunately the workaround here is pretty easy. Keep your query
> exactly as is but remove the final aggregation step so that it returns
> a set. Next, make a small application that runs this query and does
> the array bits around each row (basically prepending the final result
> with [ appending the final result with ] and putting , between rows).
>

​The point is when prepending/appending leads to deal with strings.
Transforming each value of the resultset to a string implies to escape the
double quote.
then:
row1 contains {"hello":"world"}
step 1 = prepend -> "[{\"hello\":\"world\"}"
step 2 = append -> "[{\"hello\":\"world\"},"
and so on
the json is corrupted. Hopelly I am sure I am on a wrong way about that.

> It's essential that you use a client library that does not buffer the
> entire result in memory before emitting results. This can be done in
> psql (FETCH mode), java, libpq (single row mode), etc. I suspect
> node.js pg module can do this as well, and there certainty will be
> others.
>
> The basic objective is you want the rows to be streamed out of the
> database without being buffered. If you do that, you should be able
> to stream arbitrarily large datasets out of the database to a json
> document assuming the server can produce the query.
>
> merlin
>

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Glyn Astill 2016-06-09 15:32:42 Re: slony rpm help slony1-95-2.2.2-1.rhel6.x86_64
Previous Message Merlin Moncure 2016-06-09 13:31:23 Re: array size exceeds the maximum allowed (1073741823) when building a json