Re: Fixing backslash dot for COPY FROM...CSV

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Daniel Verite" <daniel(at)manitou-mail(dot)org>
Cc: "Robert Haas" <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Fixing backslash dot for COPY FROM...CSV
Date: 2024-04-06 17:03:12
Message-ID: 2077673.1712422992@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I wrote:
> So the current behavior is that \. that is on the end of a line,
> but is not the whole line, is silently discarded and we keep going.

> All versions throw "end-of-copy marker corrupt" if there is
> something after \. on the same line.

> This is sufficiently weird that I'm starting to come around to
> Daniel's original proposal that we just drop the server's recognition
> of \. altogether (which would allow removal of some dozens of lines of
> complicated and now known-buggy code).

I experimented with that and soon ran into a nasty roadblock: it
breaks dump/restore, because pg_dump includes a "\." line after
COPY data whether or not it really needs one. Worse, that's
implemented by including the "\." line into the archive format,
so that existing dump files contain it. Getting rid of it would
require an archive format version bump, plus some hackery to allow
removal of the line when reading old dump files.

While that's surely doable with enough effort, it's not the kind
of thing to be undertaking with less than 2 days to feature freeze.
Not to mention that I'm not sure we have consensus to do it at all.

More fun stuff: PQgetline actually invents a "\." line when it
sees server end-of-copy, and we tell users of that function to
check for that not an out-of-band return value to detect EOF.
It looks like we have no callers of that in the core distro,
but do we want to deprecate it completely?

So I feel like we need to put this patch on the shelf for the moment
and come back to it early in v18. Although it seems reasonably clear
what to do on the CSV side of things, it's very much less clear what
to do about text-format handling of EOD markers, and I don't want to
change one of those things in v17 and the other in v18. Also it
seems like there are more dependencies on "\." than we realized.

There could be an argument for applying just the psql change now,
to remove its unnecessary sending of "\.". That won't break
anything and it would give us at least one year's leg up on
compatibility issues.

regards, tom lane

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Matthias van de Meent 2024-04-06 17:45:31 Re: Add bump memory context type and use it for tuplesorts
Previous Message Bertrand Drouvot 2024-04-06 16:47:31 Re: Synchronizing slots from primary to standby