From: | Sehrope Sarkuni <sehrope(at)jackdb(dot)com> |
---|---|
To: | Andrew Dunstan <andrew(at)dunslane(dot)net> |
Cc: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Joe Conway <mail(at)joeconway(dot)com>, Davin Shearer <davin(at)apache(dot)org>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Emitting JSON to file using COPY TO |
Date: | 2023-12-06 16:28:33 |
Message-ID: | CAH7T-aoWF9oHuUC2PJ303YVRFurkLMPayHawZayqdhtT2fpEHg@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general pgsql-hackers |
Big +1 to this overall feature.
This is something I've wanted for a long time as well. While it's possible
to use a COPY with text output for a trivial case, the double escaping
falls apart quickly for arbitrary data. It's really only usable when you
know exactly what you are querying and know it will not be a problem.
Regarding the defaults for the output, I think JSON lines (rather than a
JSON array of objects) would be preferred. It's more natural to combine
them and generate that type of data on the fly rather than forcing
aggregation into a single object.
Couple more features / use cases come to mind as well. Even if they're not
part of a first round of this feature I think it'd be helpful to document
them now as it might give some ideas for what does make that first cut:
1. Outputting a top level JSON object without the additional column keys.
IIUC, the top level keys are always the column names. A common use case
would be a single json/jsonb column that is already formatted exactly as
the user would like for output. Rather than enveloping it in an object with
a dedicated key, it would be nice to be able to output it directly. This
would allow non-object results to be outputted as well (e.g., lines of JSON
arrays, numbers, or strings). Due to how JSON is structured, I think this
would play nice with the JSON lines v.s. array concept.
COPY (SELECT json_build_object('foo', x) AS i_am_ignored FROM
generate_series(1, 3) x) TO STDOUT WITH (FORMAT JSON,
SOME_OPTION_TO_NOT_ENVELOPE)
{"foo":1}
{"foo":2}
{"foo":3}
2. An option to ignore null fields so they are excluded from the output.
This would not be a default but would allow shrinking the total size of the
output data in many situations. This would be recursive to allow nested
objects to be shrunk down (not just the top level). This might be
worthwhile as a standalone JSON function though handling it during output
would be more efficient as it'd only be read once.
COPY (SELECT json_build_object('foo', CASE WHEN x > 1 THEN x END) FROM
generate_series(1, 3) x) TO STDOUT WITH (FORMAT JSON,
SOME_OPTION_TO_NOT_ENVELOPE, JSON_SKIP_NULLS)
{}
{"foo":2}
{"foo":3}
3. Reverse of #2 when copying data in to allow defaulting missing fields to
NULL.
Regards,
-- Sehrope Sarkuni
Founder & CEO | JackDB, Inc. | https://www.jackdb.com/
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2023-12-06 16:28:59 | Re: Emitting JSON to file using COPY TO |
Previous Message | Tom Lane | 2023-12-06 16:26:35 | Re: Emitting JSON to file using COPY TO |
From | Date | Subject | |
---|---|---|---|
Next Message | Tristan Partin | 2023-12-06 16:28:49 | Re: Clean up some signal usage mainly related to Windows |
Previous Message | Andrew Dunstan | 2023-12-06 16:27:44 | Re: Remove MSVC scripts from the tree |