Re: [GENERAL] workaround for lack of REPLACE() function

From: Joe Conway <mail(at)joeconway(dot)com>
To: Tatsuo Ishii <t-ishii(at)sra(dot)co(dot)jp>
Cc: pgsql-hackers(at)postgresql(dot)org, lockhart(at)fourpalms(dot)org, tgl(at)sss(dot)pgh(dot)pa(dot)us
Subject: Re: [GENERAL] workaround for lack of REPLACE() function
Date: 2002-08-11 17:15:43
Message-ID: 3D569BBF.8040309@joeconway.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers pgsql-patches

Tatsuo Ishii wrote:
>>Any objection if I rework this function to meet SQL92 and fix the bug?
>

I've started working on text_substr() as described in this thread (which
is hopefully prep work for the replace() function that started the
thread). I haven't really looked at toast or multibyte closely before,
so I'd like to ask a couple of questions to be sure I'm understanding
the relevant issues correctly.

First, in textlen() I see (ignoring multibyte for a moment):

text *t = PG_GETARG_TEXT_P(0);
PG_RETURN_INT32(VARSIZE(t) - VARHDRSZ);

Tom has pointed out to me before that PG_GETARG_TEXT_P(n) incurs the
overhead of retrieving and possibly decompressing a toasted datum. So my
first question is, can we simply do:
PG_RETURN_INT32(toast_raw_datum_size(PG_GETARG_DATUM(0)) - VARHDRSZ);
and save the overhead of retrieving and decompressing the whole datum?

Now, in the multibyte case, again in textlen(), I see:

/* optimization for single byte encoding */
if (pg_database_encoding_max_length() <= 1)
PG_RETURN_INT32(VARSIZE(t) - VARHDRSZ);

PG_RETURN_INT32(
pg_mbstrlen_with_len(VARDATA(t), VARSIZE(t) - VARHDRSZ));

Three questions here.
1) In the case of encoding max length == 1, can we treat it the same as
the non-multibyte case (I presume they are exactly the same)?

2) Can encoding max length ever be < 1? Doesn't make sense to me.

3) In the case of encoding max length > 1, if I understand correctly,
each encoded character can be one *or more* bytes, up to and encluding
encoding max length bytes. So the *only* way presently to get the length
of the original character string is to loop through the entire string
checking the length of each individual character (that's what
pg_mbstrlen_with_len() does it seems)?

Finally, if 3) is true, then there is no way to avoid the retrieval and
decompression of the datum just to find out its length. For large
datums, detoasting plus the looping through each character would add a
huge amount of overhead just to get at the length of the original
string. I don't know if we need to be able to get *just* the length
often enough to really care, but if we do, I had an idea for some future
release (I wouldn't propose doing this for 7.3):

- add a new EXTENDED state to va_external for MULTIBYTE
- any string with max encoding length > 1 would be EXTENDED even if it
is not EXTERNAL and not COMPRESSED.
- to each of the structs in the union, add va_strlen
- populate va_strlen on INSERT and maintain it on UPDATE.

Now a new function similar to toast_raw_datum_size(), maybe
toast_raw_datum_strlen() could be used to get the original string
length, whether MB or not, without needing to retrieve and decompress
the entire datum.

I understand we would either: have to steal another bit from the VARHDR
which would reduce the effective size of a valena from 1GB down to .5GB;
or we would need to add a byte or two to the VARHDR which is extra
per-datum overhead. I'm not sure we would want to do either. But I
wanted to toss out the idea while it was fresh on my mind.

Thanks,

Joe

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Cornelia Boenigk 2002-08-11 17:40:12 Re: libpq
Previous Message Cornelia Boenigk 2002-08-11 15:57:37 Re: lo_import

Browse pgsql-hackers by date

  From Date Subject
Next Message Florian Weimer 2002-08-11 17:17:20 Re: [SECURITY] DoS attack on backend possible (was: Re:
Previous Message Tom Lane 2002-08-11 17:09:41 Re: [SECURITY] DoS attack on backend possible (was: Re:

Browse pgsql-patches by date

  From Date Subject
Next Message Gavin Sherry 2002-08-11 17:22:08 create or replace rule/view
Previous Message Gavin Sherry 2002-08-11 16:00:55 Re: CREATE OR REPLACE TRIGGER