From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Bruno Wolff III <bruno(at)wolff(dot)to> |
Cc: | rob stone <floriparob(at)gmail(dot)com>, pgsql-general(at)postgresql(dot)org |
Subject: | Re: endash not a graphic character? |
Date: | 2016-08-21 18:24:16 |
Message-ID: | 25067.1471803856@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Bruno Wolff III <bruno(at)wolff(dot)to> writes:
> However I am wondering about my use of [[:graph:]] to match characters
> that have glyphs. I was not expecting there to be characters that have
> glyphs to not be in the graph class. In the short term I might want to
> change the way I am testing that.
[ looks into code... ] The [[:foo:]] notations only work up to Unicode
code point U+7FF at the moment, per this comment in regc_pg_locale.c:
* Decide how many character codes we ought to look through. For C locale
* there's no need to go further than 127. Otherwise, if the encoding is
* UTF8 go up to 0x7FF, which is a pretty arbitrary cutoff but we cannot
* extend it as far as we'd like (say, 0xFFFF, the end of the Basic
* Multilingual Plane) without creating significant performance issues due
* to too many characters being fed through the colormap code. This will
* need redesign to fix reasonably, but at least for the moment we have
* all common European languages covered. Otherwise (not C, not UTF8) go
* up to 255. These limits are interrelated with restrictions discussed
* at the head of this file.
Unfortunately, these particular characters are U+2013 and U+2014 so you
lose.
Obviously there's room for improvement here, but so far nobody's been
motivated to work on it. Last discussion about it (AFAIR) was this
thread:
https://www.postgresql.org/message-id/flat/24241.1329347196%40sss.pgh.pa.us
I'm not sure if any of the subsequent work on the regex engine would
make it any easier to fix than it seemed at the time.
regards, tom lane
From | Date | Subject | |
---|---|---|---|
Next Message | Bruno Wolff III | 2016-08-21 18:55:19 | Re: endash not a graphic character? |
Previous Message | Bruno Wolff III | 2016-08-21 18:03:33 | Re: endash not a graphic character? |