Re: New "single" COPY format

From: "Joel Jacobson" <joel(at)compiler(dot)org>
To: "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>, "jian he" <jian(dot)universality(at)gmail(dot)com>
Cc: "PostgreSQL Hackers" <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: New "single" COPY format
Date: 2024-11-10 08:00:44
Message-ID: 078ef6bb-3c18-4342-a4fb-81556179546d@app.fastmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Nov 10, 2024, at 08:48, Joel Jacobson wrote:
> On Sun, Nov 10, 2024, at 08:32, Joel Jacobson wrote:
>> Option A:
>> COPY TO: Empty string field gets exported as an empty line. NULL field
>> is an error.
>> COPY FROM: Empty line is imported as an empty string.
>>
>> Option B:
>> COPY TO: NULL field gets exported as an empty line. Empty string field
>> is an error.
>> COPY FROM: Empty line is imported as a NULL value.
>>
>> I think Option A seems more useful, because:
>>
>> 1) Arbitrary text files, very often contain empty lines to separate
>> sections from each other.
>> 2) JSONL cannot contain empty lines, they are an error:
>> https://jsonlines.org/validator/
>
> David, I forgot about your NUL idea, so there is also a third option.

To avoid confusion, I should have been clear that the below idea
is just based on your NUL idea, it's not the same idea per se,
since your was about newline handling in textual types.

> Option C:
> COPY TO: NULL field gets exported as a the NUL byte. Empty string field
> is an empty line.
> COPY FROM: Empty line is imported as an empty string. NUL byte is
> imported as a NULL value.
>
> For arbitrary text files, Option C would work fine, since they usually
> don't contain NUL bytes, and if they do, then it seems useful to be
> handle to deal with such files in some way, even if it can't be known
> NUL always means NULL, then we could at least import such files, and
> then do some post-processing of the imported data, to get the desired
> result.
>
> For JSONL, Option C would also work fine, since they can't contain NUL bytes.
>
> It's a bit of a hack, but I kinda like it., since it seems like the
> only option without an error situation.
> Maybe OK if we add one a cautionary <note> the docs?

Also, we could emit NOTICE messages,
upon both COPY TO and COPY FROM,
to increase the chances of users understanding the semantics:

COPY TO:
NOTICE: NULL values encountered in data, represented as NUL bytes in output

COPY FROM:
NOTICE: NUL bytes encountered in data, stored as NULL values

/Joel

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Joel Jacobson 2024-11-10 08:32:28 Re: New "single" COPY format
Previous Message Joel Jacobson 2024-11-10 07:48:56 Re: New "single" COPY format