From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Noah Misch <noah(at)leadboat(dot)com> |
Cc: | Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Augment every test postgresql.conf |
Date: | 2019-05-12 02:43:59 |
Message-ID: | 25383.1557629039@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Noah Misch <noah(at)leadboat(dot)com> writes:
> Pushed. This broke 010_dump_connstr.pl on bowerbird, introducing 'invalid
> byte sequence for encoding "UTF8"' errors. That's because log_connections
> renders this 010_dump_connstr.pl solution insufficient:
Ugh.
> 4. If GetMessageEncoding()==PG_SQL_ASCII, make pgwin32_message_to_UTF16()
> return NULL. The caller will always send untranslated bytes to write() or
> ReportEventA(). This seems consistent with the SQL_ASCII concept and with
> pg_do_encoding_conversion()'s interpretation of SQL_ASCII.
> 5. When including a datname or rolname value in a message, hex-escape
> non-ASCII bytes. They are byte sequences, not text of known encoding.
> This preserves the most information, but it's overkill and ugly in the
> probably-common case of one encoding across all databases of a cluster.
> I'm inclined to do (1) in back branches and (4) in HEAD only. (If starting
> fresh today, I would store the encoding of each rolname and dbname or just use
> UTF8 for those particular fields.) Other preferences?
I agree that (4) is a fairly reasonable thing to do, and wouldn't mind
back-patching that. Taking a wider view, this seems closely related
to something I've been thinking about in connection with the recent
pg_stat_activity contretemps: that mechanism is also shoving strings
across database boundaries without a lot of worry about encodings.
Maybe we should try to develop a common solution.
One difference from the datname/rolname situation is that for
pg_stat_activity we can know the source encoding --- we aren't storing
it now, but we easily could. If we're thinking of a future solution
only, adding a "name encoding" field to relevant shared catalogs makes
sense perhaps. Alternatively, requiring names in shared catalogs to be
UTF8 might be a reasonable answer too.
In all these cases, throwing an error when we can't translate a character
into the destination encoding is not very pleasant. For pg_stat_activity,
I was imagining that translating such characters to '?' might be the best
answer. I don't know if we can get away with that for the datname/rolname
case --- at the very least, it opens problems with apparent duplication of
names that should be unique. I don't much like your hex-encoding answer,
though; that has its own uniqueness-violation hazards, plus it's ugly.
I don't have a strong feeling about what's best.
regards, tom lane
From | Date | Subject | |
---|---|---|---|
Next Message | Dilip Kumar | 2019-05-12 06:15:03 | Re: POC: Cleaning up orphaned files using undo logs |
Previous Message | Noah Misch | 2019-05-12 01:56:15 | Re: Augment every test postgresql.conf |