From: | "Phil Endecott" <spam_from_pgsql_lists(at)chezphil(dot)org> |
---|---|
To: | <pgsql-general(at)lists(dot)postgresql(dot)org> |
Cc: | "Andrew Gierth" <andrew(at)tao11(dot)riddles(dot)org(dot)uk> |
Subject: | Re: Text-indexing UTF-8 bytea, convert_from() immutability, null bytes... |
Date: | 2018-10-07 12:45:07 |
Message-ID: | 1538916307632@dmwebmail.dmwebmail.chezphil.org |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Hi Andrew,
Thanks for your great reply.
Andrew Gierth wrote:
>>>>>> "Phil" == Phil Endecott <spam_from_pgsql_lists(at)chezphil(dot)org> writes:
> Phil> As a hack I tried ALTER FUNCTION to make it immutable,
>
> A better approach is to wrap it in a function of your own which is
> declared immutable, rather than hacking the catalogs:
>
> create function from_utf8(bytea) returns text language plpgsql immutable
> as $$ begin return convert_from($1, 'UTF8'); end; $$;
Thanks. I'm a bit surprised that it will allow me to declare a
function immutable if it calls functions that aren't themselves
immutable, but if it works... great.
> Phil> Anyway: given the problem of creating a text search index over
> Phil> bytea data that contains UTF-8 text, which may include oddities
> Phil> like null bytes, what would you do?
>
> You can search for 0x00 in a bytea using position() or LIKE. What do you
> want to do with values that contain null bytes? or values which you
> think are supposed to be valid utf8 text but are not?
As long as it doesn't crash I don't really care; it would be better
if the text search worked for the valid parts of the text but even
that isn't essential.
I think I will probably need to do some more preprocessing on the
data when I load it, at least to remove the null bytes.
Regards, Phil.
From | Date | Subject | |
---|---|---|---|
Next Message | ROS Didier | 2018-10-07 13:13:24 | RE: Why the index is not used ? |
Previous Message | Andrew Gierth | 2018-10-07 02:52:25 | Re: Text-indexing UTF-8 bytea, convert_from() immutability, null bytes... |