From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | J Smith <dark(dot)panda+lists(at)gmail(dot)com> |
Cc: | Florian Pflug <fgp(at)phlo(dot)org>, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: unaccent extension missing some accents |
Date: | 2011-11-07 00:15:04 |
Message-ID: | 27438.1320624904@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
J Smith <dark(dot)panda+lists(at)gmail(dot)com> writes:
> I've attached a patch against master for unaccent.c that uses swscanf
> along with char2wchar and wchar2char instead of sscanf directly to
> initialize the unaccent extension and it appears to fix the problem in
> both the master and 9.1 branches.
swscanf doesn't seem like an acceptable approach: it's a function that
is relied on nowhere else in PG, so it adds new portability risks of its
own. It doesn't exist on some platforms that we support (like the one
I'm typing this message on) and there's no real good reason to assume
that it's not broken in its own ways on others.
If you really want to pursue this, I'd suggest parsing the line
manually, perhaps via strchr searches for \t and \n. It likely wouldn't
be very many more lines than what you've got here.
However, the bigger picture is that OS X's UTF8 locales are broken
through-and-through, and most of their other problems are not feasible
to work around. So basically you can't use them for anything
interesting, and it's not clear that it's worth putting any time into
solving individual problems. In the particular case here, the issue
presumably is that sscanf is relying on isspace() ... but we rely on
isspace() directly, in quite a lot of places, so how much is it going
to fix to dodge it right here?
regards, tom lane
From | Date | Subject | |
---|---|---|---|
Next Message | Jeff Davis | 2011-11-07 02:28:20 | btree gist known problems |
Previous Message | J Smith | 2011-11-06 23:43:22 | Re: unaccent extension missing some accents |