From: | Michael Paquier <michael(at)paquier(dot)xyz> |
---|---|
To: | Andres Freund <andres(at)anarazel(dot)de> |
Cc: | pgsql-hackers(at)postgresql(dot)org, Sutou Kouhei <kou(at)clear-code(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Tatsuo Ishii <ishii(at)sraoss(dot)co(dot)jp> |
Subject: | Re: confusing / inefficient "need_transcoding" handling in copy |
Date: | 2024-02-06 04:49:38 |
Message-ID: | ZcG6YuQ15j3H0whd@paquier.xyz |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Mon, Feb 05, 2024 at 06:05:04PM -0800, Andres Freund wrote:
> I don't really understand why we need to validate anything during COPY TO?
> Which is good, because it turns out that we don't actually validate anything,
> as pg_server_to_any() returns without doing anything if the encoding matches:
>
> if (encoding == DatabaseEncoding->encoding ||
> encoding == PG_SQL_ASCII)
> return unconstify(char *, s); /* assume data is valid */
>
> This means that the strlen() we do in the call do pg_server_to_any(), which on
> its own takes 14.25% of the cycles, computes something that will never be
> used.
Indeed, that's wasting cycles for nothing when the client and server
encoding match.
> Unsurprisingly, only doing transcoding when encodings differ yields a sizable
> improvement, about 18% for [2].
>
> I haven't yet dug into the code history. One guess is that this should only
> have been set this way for COPY FROM.
Looking the git history, this looks like an oversight of c61a2f58418e
that has added the condition on pg_database_encoding_max_length(), no?
Adding Tom and Ishii-san, even if this comes from 2005.
--
Michael
From | Date | Subject | |
---|---|---|---|
Next Message | jian he | 2024-02-06 04:55:51 | Re: remaining sql/json patches |
Previous Message | Jonathan S. Katz | 2024-02-06 04:43:39 | 2024-02-08 release announcement draft |