Re: confusing / inefficient "need_transcoding" handling in copy

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: pgsql-hackers(at)postgresql(dot)org, Sutou Kouhei <kou(at)clear-code(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Tatsuo Ishii <ishii(at)sraoss(dot)co(dot)jp>
Subject: Re: confusing / inefficient "need_transcoding" handling in copy
Date: 2024-02-06 04:49:38
Message-ID: ZcG6YuQ15j3H0whd@paquier.xyz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Feb 05, 2024 at 06:05:04PM -0800, Andres Freund wrote:
> I don't really understand why we need to validate anything during COPY TO?
> Which is good, because it turns out that we don't actually validate anything,
> as pg_server_to_any() returns without doing anything if the encoding matches:
>
> if (encoding == DatabaseEncoding->encoding ||
> encoding == PG_SQL_ASCII)
> return unconstify(char *, s); /* assume data is valid */
>
> This means that the strlen() we do in the call do pg_server_to_any(), which on
> its own takes 14.25% of the cycles, computes something that will never be
> used.

Indeed, that's wasting cycles for nothing when the client and server
encoding match.

> Unsurprisingly, only doing transcoding when encodings differ yields a sizable
> improvement, about 18% for [2].
>
> I haven't yet dug into the code history. One guess is that this should only
> have been set this way for COPY FROM.

Looking the git history, this looks like an oversight of c61a2f58418e
that has added the condition on pg_database_encoding_max_length(), no?
Adding Tom and Ishii-san, even if this comes from 2005.
--
Michael

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message jian he 2024-02-06 04:55:51 Re: remaining sql/json patches
Previous Message Jonathan S. Katz 2024-02-06 04:43:39 2024-02-08 release announcement draft