Re: Dead encoding conversion functions

From: Noah Misch <noah(at)leadboat(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Dead encoding conversion functions
Date: 2019-06-15 18:07:32
Message-ID: 20190615180732.GA239428@rfd.leadboat.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, May 29, 2019 at 03:03:13PM -0400, Tom Lane wrote:
> Pursuant to today's discussion at PGCon about code coverage, I went
> nosing into some of the particularly under-covered subdirectories
> in our tree, and immediately tripped over an interesting factoid:
> the ASCII<->MIC and ASCII<->UTF8 encoding conversion functions are
> untested ... not because the regression tests don't try, but because
> those conversions are unreachable. pg_do_encoding_conversion() and
> its sister functions have hard-wired fast paths for any conversion
> in which the source or target encoding is SQL_ASCII, so that an
> encoding conversion function declared for such a case will never
> be used.

> This situation seems kinda silly. My inclination is to delete
> these functions as useless, but I suppose another approach is
> to suppress the fast paths if there's a declared conversion function.
> (Doing so would likely require added catalog lookups in places we
> might not want them...)

Removing the fast paths to make ascii_to_utf8() reachable would cause ERROR
when server_encoding=SQL_ASCII, client_encoding=UTF8, and a query would
otherwise send the client any character outside 7-bit ASCII. That's fairly
defensible, but doing it for only UTF8 and MULE_INTERNAL is not. So if we
like the ascii_to_utf8() behavior, I think the action would be to replace the
fast path with an encoding-independent verification that all bytes are 7-bit
ASCII. (The check would not apply when both server_encoding and
client_encoding are SQL_ASCII, of course.) Alternately, one might prefer to
replace the fast path with an encoding verification; in the SQL_ASCII-to-UTF8
case, we'd allow byte sequences that are valid UTF8, even though the validity
may be a coincidence and mojibake may ensue. SQL_ASCII is for being casual
about encoding, so it's not clear to me whether or not either prospective
behavior change would be an improvement. However, I do find it clear to
delete ascii_to_utf8() and ascii_to_mic().

> If we do delete them as useless, it might also be advisable to change
> CreateConversionCommand() to refuse creation of conversions to/from
> SQL_ASCII, to prevent future confusion.

Sounds good.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Noah Misch 2019-06-15 18:37:59 Re: pg_upgrade can result in early wraparound on databases with high transaction load
Previous Message Tom Lane 2019-06-15 17:01:41 Re: Extracting only the columns needed for a query