From: | "Phoenix Kiula" <phoenix(dot)kiula(at)gmail(dot)com> |
---|---|
To: | "Postgres General" <pgsql-general(at)postgresql(dot)org> |
Subject: | FInding "corrupt" values in UTF-8 tables (regexp question, I think) |
Date: | 2007-08-17 15:58:05 |
Message-ID: | e373d31e0708170858o51492ebaq53283c14273fcdf2@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
I'm noticing that some of my data has been imported as junk text:
For instance:
klciã«"
What would be the SQL to find data of this nature? My column can only
have alphanumeric data, and the only symbols allowed are "-" and "_",
so I tried this regexp query:
select id, t_code
from traders
where t_code ~ '[^A-Za-z1-9\-]'
limit 100;
But this starts to return values such as "181xn-807199" which is valid
as per the above regexp? Also, when I try to include the underscore,
as follows...
select id, t_code
from traders
where t_code ~ '[^A-Za-z1-9\-\_]'
limit 100;
This gives me an error: "ERROR: invalid regular expression: invalid
character range".
What am I missing? Does this have something to do with erroneous
encodings? I want my data to be utf-8 but I do want to find it with
latin1 queries when the text in columns is supposed to be only latin1
characters! Or is "a-z" in utf-8 considered different from "a-z" in
latin1?
From | Date | Subject | |
---|---|---|---|
Next Message | Manuel Sugawara | 2007-08-17 16:09:09 | Re: FInding "corrupt" values in UTF-8 tables (regexp question, I think) |
Previous Message | Pavel Stehule | 2007-08-17 15:53:57 | Re: why it doesn't work? referential integrity |