Re: New "single" COPY format

From: "Joel Jacobson" <joel(at)compiler(dot)org>
To: "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>, "jian he" <jian(dot)universality(at)gmail(dot)com>
Cc: "PostgreSQL Hackers" <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: New "single" COPY format
Date: 2024-11-10 07:32:30
Message-ID: ea9a8793-23e0-4d28-86e9-dee922cdb5bf@app.fastmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Nov 10, 2024, at 05:55, David G. Johnston wrote:
> On Saturday, November 9, 2024, jian he <jian(dot)universality(at)gmail(dot)com> wrote:
>> <para>
>> The <literal>list</literal> format does not distinguish a
>> <literal>NULL</literal>
>> value from an empty string. Empty lines are imported as empty strings, not
>> as <literal>NULL</literal> values.
>> </para>
>> we only mentioned import, not export (COPY TO) dealing with
>> NULL value.
>>
>
> Yeah, while not being able to distinguish between the two is consistent
> with the list format’s premise/design the choice would need to resolve
> to the null value in order to continue to be data-type agnostic. We’d
> simply have to note for the text types that empty strings in lists are
> not supported, and if encountered will be resolved to a null value.

Seems like we have two options to decide between, both with pros and cons.

For full reversibility, we can't support both NULL values and the empty string.

To make a sound design decision here, I think we should test both options
against all real-world use-cases we can come up with.

The use-cases I can think of are:

1) Arbitrary unstructured text lists, where each line could be any text string
2) JSONL, where each line is a valid JSON value, and cannot be an empty string

Option A:
COPY TO: Empty string field gets exported as an empty line. NULL field is an error.
COPY FROM: Empty line is imported as an empty string.

Option B:
COPY TO: NULL field gets exported as an empty line. Empty string field is an error.
COPY FROM: Empty line is imported as a NULL value.

I think Option A seems more useful, because:

1) Arbitrary text files, very often contain empty lines to separate sections from each other.
2) JSONL cannot contain empty lines, they are an error: https://jsonlines.org/validator/

Nothing implemented yet, awaiting opinions.

/Joel

[1] https://jsonlines.org/validator/

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Joel Jacobson 2024-11-10 07:48:56 Re: New "single" COPY format
Previous Message Joel Jacobson 2024-11-10 07:05:54 Re: New "single" COPY format