Re: confusing / inefficient "need_transcoding" handling in copy

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org, Sutou Kouhei <kou(at)clear-code(dot)com>, Tatsuo Ishii <ishii(at)sraoss(dot)co(dot)jp>
Subject: Re: confusing / inefficient "need_transcoding" handling in copy
Date: 2024-02-09 00:36:28
Message-ID: ZcVzjGWFobGpNrxs@paquier.xyz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Feb 08, 2024 at 10:25:07AM +0200, Heikki Linnakangas wrote:
> There's no validation, just conversion. I'd suggest:
>
> "Set up encoding conversion info if the file and server encodings differ
> (see also pg_server_to_any)."
>
> Other than that, +1

Cool. I've used your wording and applied that on HEAD.

> BTW, I can see an optimization opportunity even if the encodings differ:
> Currently, CopyAttributeOutText() calls pg_server_to_any(), and then grovels
> through the string to find any characters that need to be quoted. You could
> do it the other way round and handle quoting before the conversion. That has
> two benefits:
>
> 1. You don't need the strlen() call, because you just scanned through the
> string so you already know its length.
> 2. You don't need to worry about 'encoding_embeds_ascii' when you operate on
> the server encoding.

That sounds right, still it looks like there would be cases where
you'd need the strlen() call if !encoding_embeds_ascii.
--
Michael

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2024-02-09 00:43:07 Re: confusing / inefficient "need_transcoding" handling in copy
Previous Message Jim Jones 2024-02-08 23:34:54 Re: Psql meta-command conninfo+