Re: [PATCH] hstore: Fix parsing on Mac OS X: isspace() is locale specific

From: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Evan Jones <evan(dot)jones(at)datadoghq(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH] hstore: Fix parsing on Mac OS X: isspace() is locale specific
Date: 2023-10-10 03:17:57
Message-ID: CA+hUKGKdZz+z39vLsYazV5U80JaQreGC71+x5ZRdDD5JBG53zw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

FTR I ran into a benign case of the phenomenon in this thread when
dealing with row types. In rowtypes.c, we double-quote stuff
containing spaces, but we detect them by passing individual bytes of
UTF-8 sequences to isspace(). Like macOS, Windows thinks that 0xa0 is
a space when you do that, so for example the Korean character '점'
(code point C810, UTF-8 sequence EC A0 90) gets quotes on Windows but
not on Linux. That confused a migration/diff tool while comparing
Windows and Linux database servers using that representation. Not a
big deal, I guess no one ever promised that the format was stable
across platforms, and I don't immediately see a way for anything more
serious to go wrong (though I may lack imagination). It does seem a
bit weird to be using locale-aware tokenising for a machine-readable
format, and then making sure its behaviour is undefined by feeding it
chopped up bytes.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Rowley 2023-10-10 03:52:05 Re: Crash in add_paths_to_append_rel
Previous Message Peter Smith 2023-10-10 03:17:25 Re: PGDOCS - add more links in the pub/sub reference pages