Re: Emitting JSON to file using COPY TO

From: Joe Conway <mail(at)joeconway(dot)com>
To: Sehrope Sarkuni <sehrope(at)jackdb(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Davin Shearer <davin(at)apache(dot)org>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Emitting JSON to file using COPY TO
Date: 2023-12-06 21:28:59
Message-ID: 4d5688f4-9582-4093-8448-e1867bc9e2bc@joeconway.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers

On 12/6/23 11:28, Sehrope Sarkuni wrote:
> Big +1 to this overall feature.

cool!

> Regarding the defaults for the output, I think JSON lines (rather than a
> JSON array of objects) would be preferred. It's more natural to combine
> them and generate that type of data on the fly rather than forcing
> aggregation into a single object.

So that is +2 (Sehrope and me) for the status quo (JSON lines), and +2
(Andrew and Davin) for defaulting to json arrays. Anyone else want to
weigh in on that issue?

> Couple more features / use cases come to mind as well. Even if they're
> not part of a first round of this feature I think it'd be helpful to
> document them now as it might give some ideas for what does make that
> first cut:
>
> 1. Outputting a top level JSON object without the additional column
> keys. IIUC, the top level keys are always the column names. A common use
> case would be a single json/jsonb column that is already formatted
> exactly as the user would like for output. Rather than enveloping it in
> an object with a dedicated key, it would be nice to be able to output it
> directly. This would allow non-object results to be outputted as well
> (e.g., lines of JSON arrays, numbers, or strings). Due to how JSON is
> structured, I think this would play nice with the JSON lines v.s. array
> concept.
>
> COPY (SELECT json_build_object('foo', x) AS i_am_ignored FROM
> generate_series(1, 3) x) TO STDOUT WITH (FORMAT JSON,
> SOME_OPTION_TO_NOT_ENVELOPE)
> {"foo":1}
> {"foo":2}
> {"foo":3}

Your example does not match what you describe, or do I misunderstand? I
thought your goal was to eliminate the repeated "foo" from each row...

> 2. An option to ignore null fields so they are excluded from the output.
> This would not be a default but would allow shrinking the total size of
> the output data in many situations. This would be recursive to allow
> nested objects to be shrunk down (not just the top level). This might be
> worthwhile as a standalone JSON function though handling it during
> output would be more efficient as it'd only be read once.
>
> COPY (SELECT json_build_object('foo', CASE WHEN x > 1 THEN x END) FROM
> generate_series(1, 3) x) TO STDOUT WITH (FORMAT JSON,
> SOME_OPTION_TO_NOT_ENVELOPE, JSON_SKIP_NULLS)
> {}
> {"foo":2}
> {"foo":3}

clear enough I think

> 3. Reverse of #2 when copying data in to allow defaulting missing fields
> to NULL.

good to record the ask, but applies to a different feature (COPY FROM
instead of COPY TO).

--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Sehrope Sarkuni 2023-12-06 21:36:02 Re: Emitting JSON to file using COPY TO
Previous Message Andrew Dunstan 2023-12-06 21:03:12 Re: Emitting JSON to file using COPY TO

Browse pgsql-hackers by date

  From Date Subject
Next Message Sehrope Sarkuni 2023-12-06 21:36:02 Re: Emitting JSON to file using COPY TO
Previous Message Matthias van de Meent 2023-12-06 21:20:03 Re: automating RangeTblEntry node support