From: | Andrew Dunstan <andrew(at)dunslane(dot)net> |
---|---|
To: | Joel Jacobson <joel(at)compiler(dot)org>, "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>, jian he <jian(dot)universality(at)gmail(dot)com> |
Cc: | PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: New "single" COPY format |
Date: | 2024-12-19 13:40:05 |
Message-ID: | 0b70a518-f6cc-483b-8e1c-51a8585f0f72@dunslane.net |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 2024-12-16 Mo 10:09 AM, Joel Jacobson wrote:
> Hi hackers,
>
> After further consideration, I'm withdrawing the patch.
> Some fundamental questions remain unresolved:
>
> - Should round-trip fidelity be a strict goal? By "round-trip fidelity",
> I mean that data exported and then re-imported should yield exactly
> the original values, including the distinction between NULL and empty strings.
> - If round-trip fidelity is a requirement, how do we distinguish NULL from empty
> strings without delimiters or escapes?
> - Is automatic newline detection (as in "csv" and "text") more valuable than
> the ability to embed \r (CR) characters?
> - Would it be better to extend the existing COPY options rather than introducing
> a new format?
> - Or should we consider a JSONL format instead, one that avoids the NULL/empty
> string problem entirely?
>
> No clear solution or consensus has emerged. For now, I'll step back from the
> proposal. If someone wants to revisit this later, I'd be happy to contribute.
>
> Thanks again for all the feedback and consideration.
>
We seem to have got seriously into the weeds, here. I'd be sorry to see
this dropped. After all, it's not something new, and while we have a
sort of workaround for "one json doc per line" it's far from obvious,
and except in a few blog posts undocumented.
I think we're trying to be far too general here but in the absence of
more general use cases. The ones I recall having encountered in the wild
are:
. one json datum per line
. one json document per file
. a sequence of json documents per file
The last one is hard to deal with, and I think I've only seen it once or
twice, so I suggest leaving it aside for now.
Notice these are all JSON. I could imagine XML might have similar
requirements, but I encounter it extremely rarely.
Regarding NULL, an empty string is not a valid JSON literal, so there
should be no confusion there. It is valid for XML, though.
Given all that I think restricting ourselves to just the JSON cases, and
possibly just to JSONL, would be perfectly reasonable.
Regarding CR, it's not a valid character in a JSON string item, although
it is valid in JSON whitespace. I would not treat it as magical unless
it immediately precedes an NL. That gives rise to a very sight
ambiguity, but I think it's one we could live with.
As for what the format is called, I don't like the "LIST" proposal much,
even for the general case. Seems too close to an array.
cheers
andrew
--
Andrew Dunstan
EDB: https://www.enterprisedb.com
From | Date | Subject | |
---|---|---|---|
Next Message | Greg Sabino Mullane | 2024-12-19 13:57:47 | Re: Send duration output to separate log files |
Previous Message | Euler Taveira | 2024-12-19 13:34:39 | Re: log_min_messages per backend type |