From: | Noah Misch <noah(at)leadboat(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Augment every test postgresql.conf |
Date: | 2019-05-12 08:13:47 |
Message-ID: | 20190512081347.GA1173385@rfd.leadboat.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Sat, May 11, 2019 at 10:43:59PM -0400, Tom Lane wrote:
> Noah Misch <noah(at)leadboat(dot)com> writes:
> > Pushed. This broke 010_dump_connstr.pl on bowerbird, introducing 'invalid
> > byte sequence for encoding "UTF8"' errors. That's because log_connections
> > renders this 010_dump_connstr.pl solution insufficient:
>
> Ugh.
>
> > 4. If GetMessageEncoding()==PG_SQL_ASCII, make pgwin32_message_to_UTF16()
> > return NULL. The caller will always send untranslated bytes to write() or
> > ReportEventA(). This seems consistent with the SQL_ASCII concept and with
> > pg_do_encoding_conversion()'s interpretation of SQL_ASCII.
>
> > 5. When including a datname or rolname value in a message, hex-escape
> > non-ASCII bytes. They are byte sequences, not text of known encoding.
> > This preserves the most information, but it's overkill and ugly in the
> > probably-common case of one encoding across all databases of a cluster.
>
> > I'm inclined to do (1) in back branches and (4) in HEAD only. (If starting
> > fresh today, I would store the encoding of each rolname and dbname or just use
> > UTF8 for those particular fields.) Other preferences?
>
> I agree that (4) is a fairly reasonable thing to do, and wouldn't mind
> back-patching that.
Okay. Absent objections, I'll just do it that way.
> Taking a wider view, this seems closely related
> to something I've been thinking about in connection with the recent
> pg_stat_activity contretemps: that mechanism is also shoving strings
> across database boundaries without a lot of worry about encodings.
> Maybe we should try to develop a common solution.
>
> One difference from the datname/rolname situation is that for
> pg_stat_activity we can know the source encoding --- we aren't storing
> it now, but we easily could. If we're thinking of a future solution
> only, adding a "name encoding" field to relevant shared catalogs makes
> sense perhaps. Alternatively, requiring names in shared catalogs to be
> UTF8 might be a reasonable answer too.
>
> In all these cases, throwing an error when we can't translate a character
> into the destination encoding is not very pleasant. For pg_stat_activity,
> I was imagining that translating such characters to '?' might be the best
> answer. I don't know if we can get away with that for the datname/rolname
> case --- at the very least, it opens problems with apparent duplication of
> names that should be unique. I don't much like your hex-encoding answer,
> though; that has its own uniqueness-violation hazards, plus it's ugly.
Another case of byte sequence masquerading as text is pg_settings.setting.
In most contexts, it's important to convey exact values. Error messages can
use '?'. I wouldn't let dump/reload of a rolname corrupt it that way, and I
wouldn't recognize the '?' version for authentication. While
pg_stat_activity.query could use '?', I'd encourage adding bytea and encoding
columns for exact transmission. pg_stat_activity can't standardize on UTF8
without shrinking the set of valid queries or inaccurately reporting some,
neither of which is attractive.
datname/rolname could afford to be more prescriptive, since non-ASCII names
are full of bugs today. A useful consequence of UTF8 datname/rolname would be
today's "pg_dumpall --globals" remaining simple. If we were to support
arbitrary encodings with a "name encoding" field, the general-case equivalent
of "pg_dumpall --globals" would connect to several databases of different
encodings in order to dump all objects, perhaps even creating a temporary
database if no suitable-encoding database existed.
MULE_INTERNAL presents trouble since we don't have a UTF8<->MULE_INTERNAL
conversion. If we standardized cross-database strings on UTF8, it would be
impossible to read such strings, create roles, etc. from a MULE_INTERNAL
database. I suppose we'd either add the conversion or deprecate
MULE_INTERNAL, forbidding its use as the initdb encoding.
From | Date | Subject | |
---|---|---|---|
Next Message | Michael Paquier | 2019-05-12 08:57:23 | Re: Why is infinite_recurse test suddenly failing? |
Previous Message | Oleg Bartunov | 2019-05-12 07:00:38 | Re: PG 12 draft release notes |