Re: Fixing backslash dot for COPY FROM...CSV

From: "Daniel Verite" <daniel(at)manitou-mail(dot)org>
To: "Robert Haas" <robertmhaas(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Fixing backslash dot for COPY FROM...CSV
Date: 2024-01-16 14:42:45
Message-ID: 897e2a6d-65b7-468f-b212-de3b07b4dac4@manitou-mail.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Robert Haas wrote:

> Part of my hesitancy, I suppose, is that I don't
> understand why we even have this strange convention of making \.
> terminate the input in the first place -- I mean, why wouldn't that be
> done in some kind of out-of-band way, rather than including a special
> marker in the data?

The v3 protocol added the out-of-band method, but the v2 protocol
did not have it, and as far as I understand, this is the reason why
CopyReadLineText() must interpret \. as an end-of-data marker.

The v2 protocol was removed in pg14
https://www.postgresql.org/docs/release/14.0/
<quote>
Remove server and libpq support for the version 2 wire protocol (Heikki
Linnakangas)
This was last used as the default in PostgreSQL 7.3 (released in 2002).
</quote>

Also I hadnt' noticed this before, but the current doc has this mention
that is relevant to this patch:

https://www.postgresql.org/docs/current/protocol-changes.html
"Summary of Changes since Protocol 2.0"
<quote>
COPY data is now encapsulated into CopyData and CopyDone
messages. There is a well-defined way to recover from errors during
COPY. The special “\.” last line is not needed anymore, and is not
sent during COPY OUT. (It is still recognized as a terminator during
COPY IN, but its use is deprecated and will eventually be removed.)
</quote>

What the present patch does is essentially, for the server-side part,
stop recognizing "\." as as terminator, like this paragraph says, but
it does that for CSV only, not for TEXT.

> Hmm. Looking at the rest of the patch, it seems like you're removing
> the logic that prevents us from interpreting
>
> \. lksdghksdhgjskdghjs
>
> as an end-of-file while in CSV mode. But I would have thought based on
> what problem you're trying to fix that you would have wanted to keep
> that logic and further restrict it so that it only applies when not
> within a quoted string.
>
> Maybe I'm misunderstanding what bug you're trying to fix?

The fix is that \. is no longer recognized as special in CSV, whether
alone on a line or not, and whether in a quoted section or not.
It's always interpreted as data, like it would have been in
the first place, I imagine, if the v2 protocol could have handled
it. This is why the patch consists mostly of removing code and
simplifying comments.

Best regards,
--
Daniel Vérité
https://postgresql.verite.pro/
Twitter: @DanielVerite

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrey M. Borodin 2024-01-16 14:44:33 Re: UUID v7
Previous Message feichanghong 2024-01-16 14:26:52 Re: "ERROR: could not open relation with OID 16391" error was encountered when reindexing