From: | "Joel Jacobson" <joel(at)compiler(dot)org> |
---|---|
To: | "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>, "jian he" <jian(dot)universality(at)gmail(dot)com> |
Cc: | "PostgreSQL Hackers" <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: New "single" COPY format |
Date: | 2024-11-10 08:00:44 |
Message-ID: | 078ef6bb-3c18-4342-a4fb-81556179546d@app.fastmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Sun, Nov 10, 2024, at 08:48, Joel Jacobson wrote:
> On Sun, Nov 10, 2024, at 08:32, Joel Jacobson wrote:
>> Option A:
>> COPY TO: Empty string field gets exported as an empty line. NULL field
>> is an error.
>> COPY FROM: Empty line is imported as an empty string.
>>
>> Option B:
>> COPY TO: NULL field gets exported as an empty line. Empty string field
>> is an error.
>> COPY FROM: Empty line is imported as a NULL value.
>>
>> I think Option A seems more useful, because:
>>
>> 1) Arbitrary text files, very often contain empty lines to separate
>> sections from each other.
>> 2) JSONL cannot contain empty lines, they are an error:
>> https://jsonlines.org/validator/
>
> David, I forgot about your NUL idea, so there is also a third option.
To avoid confusion, I should have been clear that the below idea
is just based on your NUL idea, it's not the same idea per se,
since your was about newline handling in textual types.
> Option C:
> COPY TO: NULL field gets exported as a the NUL byte. Empty string field
> is an empty line.
> COPY FROM: Empty line is imported as an empty string. NUL byte is
> imported as a NULL value.
>
> For arbitrary text files, Option C would work fine, since they usually
> don't contain NUL bytes, and if they do, then it seems useful to be
> handle to deal with such files in some way, even if it can't be known
> NUL always means NULL, then we could at least import such files, and
> then do some post-processing of the imported data, to get the desired
> result.
>
> For JSONL, Option C would also work fine, since they can't contain NUL bytes.
>
> It's a bit of a hack, but I kinda like it., since it seems like the
> only option without an error situation.
> Maybe OK if we add one a cautionary <note> the docs?
Also, we could emit NOTICE messages,
upon both COPY TO and COPY FROM,
to increase the chances of users understanding the semantics:
COPY TO:
NOTICE: NULL values encountered in data, represented as NUL bytes in output
COPY FROM:
NOTICE: NUL bytes encountered in data, stored as NULL values
/Joel
From | Date | Subject | |
---|---|---|---|
Next Message | Joel Jacobson | 2024-11-10 08:32:28 | Re: New "single" COPY format |
Previous Message | Joel Jacobson | 2024-11-10 07:48:56 | Re: New "single" COPY format |