From: | Martijn van Oosterhout <kleptog(at)svana(dot)org> |
---|---|
To: | Gregory Maxwell <gmaxwell(at)gmail(dot)com> |
Cc: | Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, Gavin Sherry <swm(at)linuxworld(dot)com(dot)au>, Peter Eisentraut <peter_e(at)gmx(dot)net>, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Upcoming PG re-releases |
Date: | 2005-12-09 16:17:33 |
Message-ID: | 20051209161733.GC20352@svana.org |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers pgsql-www |
On Thu, Dec 08, 2005 at 05:54:35PM -0500, Gregory Maxwell wrote:
> No, what is needed for people who care about fixing their data is a
> loadable strip_invalid_utf8() that works in older versions.. then just
> select * from bar where foo != strip_invalid_utf8(foo); The function
> would be useful in general, for example, if you have an application
> which doesn't already have much utf8 logic, you want to use a text
> field, and stripping is the behaviour you want. For example, lots of
> simple web applications.
Would something like the following work? It's written in pl/pgsql and
does (AFAICS) the same checking as the backend in recent releases.
Except the backend only supports up to 4-byte UTF-8 whereas this
function checks upto six byte. For a six byte UTF-8 character, who is
wrong?
In any case, people should be able to do something like:
SELECT field FROM table WHERE NOT utf8_verify(field,4);
To check conformance with PostgreSQL 8.1. Note, I don't have large
chunks of UTF-8 to test with but it works for the characters I tried
with. Tested with 7.4.
Have a nice day,
--
Martijn van Oosterhout <kleptog(at)svana(dot)org> http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.
Attachment | Content-Type | Size |
---|---|---|
utf8_verify.sql | text/plain | 2.1 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2005-12-09 16:18:45 | Re: Warm-cache prefetching |
Previous Message | Kenneth Marshall | 2005-12-09 15:59:48 | Re: Warm-cache prefetching |
From | Date | Subject | |
---|---|---|---|
Next Message | Bruce Momjian | 2005-12-09 16:34:22 | Re: Upcoming PG re-releases |
Previous Message | Gregory Maxwell | 2005-12-08 22:54:35 | Re: Upcoming PG re-releases |