From: | Joe Conway <mail(at)joeconway(dot)com> |
---|---|
To: | Tatsuo Ishii <t-ishii(at)sra(dot)co(dot)jp> |
Cc: | pgsql-patches(at)postgresql(dot)org, lockhart(at)fourpalms(dot)org, tgl(at)sss(dot)pgh(dot)pa(dot)us |
Subject: | Re: [HACKERS] [GENERAL] workaround for lack of REPLACE() function |
Date: | 2002-08-14 06:15:25 |
Message-ID: | 3D59F57D.1070804@joeconway.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general pgsql-hackers pgsql-patches |
Tatsuo Ishii wrote:
> Joe Conway wrote:
>>Any objection if I rework this function to meet SQL92 and fix the bug?
>
> I don't object.
>
>>Or is the SQL92 part not desirable because it breaks backward
>>compatability?
>
> I don't think so.
>
>>In any case, can the #ifdef MULTIBYTE's be removed now in favor of a
>>test for encoding max length?
>
> Sure.
<sorry so long-winded>
Attached is a patch that implements the above items wrt text_substr(). I
also modified textlen(), textoctetlen(), byteaoctetlen(), and
bytea_substr(). Here's a summary of the change to each:
- text_substr(): rewrite function to meet SQL92, fix MB related bug,
and remove #ifdef MULTIBYTE.
- bytea_substr(): same as text_substr() enc max len == 1.
- textoctetlen(), byteaoctetlen(): use
toast_raw_datum_size(PG_GETARG_DATUM(0)) - VARHDRSZ)
to avoid detoasting.
- textlen(): same as textoctetlen() for enc max len == 1, and remove
#ifdef MULTIBYTE.
I did some benchmarking to ensure no performance degradation, and to
help me understand MB and related performance issues. The results were
very enlightening:
===================================================================
First test - textlen() (already reported, repeated here for completeness):
-------------------------------------------------------------------
create table strtest(f1 text);
do 100 times
insert into strtest values('12345....'); -- 100000 characters
loop
do 1000 times
select length(f1) from strtest;
loop
-------------------------------------------------------------------
Results:
SQL_ASCII database, new code 2 seconds
SQL_ASCII database, old code 66 seconds
EUC_JP database, new & old code 469 seconds
===================================================================
Second test - short string test:
-------------------------------------------------------------------
create table parts(partnum text);
<fill with ~220000 rows, 8 to 12 characters each>
do 300 times
select substr(partnum, 3, 3) from parts;
loop
-------------------------------------------------------------------
Results:
SQL_ASCII database, old code 352 seconds
SQL_ASCII database, new code 350 seconds
EUC_JP database, old code 461 seconds
EUC_JP database, new code 422 seconds
===================================================================
Third test - long string, EXTENDED storage (EXTERNAL+COMPRESSED):
-------------------------------------------------------------------
create table strtest(f1 text);
do 100 times
insert into strtest values('12345....'); -- 100000 characters
loop
do 1000 times
select substr(f1, 89000, 10000) from strtest;
loop
-------------------------------------------------------------------
Results:
SQL_ASCII database, old code 59 seconds
SQL_ASCII database, new code 58 seconds
EUC_JP database, old code 915 seconds
EUC_JP database, new code 912 seconds
===================================================================
Forth test - long string, EXTERNAL storage (not COMPRESSED)
-------------------------------------------------------------------
create table strtest(f1 text);
do 100 times
insert into strtest values('12345....'); -- 100000 characters
loop
do 1000 times
select substr(f1, 89000, 10000) from strtest;
loop
-------------------------------------------------------------------
Results:
SQL_ASCII database, old code 17 seconds
SQL_ASCII database, new code 17 seconds
EUC_JP database, old code 918 seconds
EUC_JP database, new code 911 seconds
The only remaining problem is that this causes opr_sanity to fail based
on this query:
-- Considering only built-in procs (prolang = 12), look for multiple
-- uses of the same internal function (ie, matching prosrc fields).
-- It's OK to have several entries with different pronames for the same
-- internal function, but conflicts in the number of arguments and other
-- critical items should be complained of.
SELECT p1.oid, p1.proname, p2.oid, p2.proname
FROM pg_proc AS p1, pg_proc AS p2
WHERE p1.oid != p2.oid AND
p1.prosrc = p2.prosrc AND
p1.prolang = 12 AND p2.prolang = 12 AND
(p1.prolang != p2.prolang OR
p1.proisagg != p2.proisagg OR
p1.prosecdef != p2.prosecdef OR
p1.proisstrict != p2.proisstrict OR
p1.proretset != p2.proretset OR
p1.provolatile != p2.provolatile OR
p1.pronargs != p2.pronargs);
This fails because I implemented text_substr() and bytea_substr() to
take either 2 or 3 args. This was necessary for SQL92 spec compliance.
SQL92 requires L < 0 to throw an error, and L IS NULL to return NULL. It
also requires that if L is not provided, the length to the end of the
string is assumed. Current code handles L IS NULL correctly but not L <
0 -- it assumes L < 0 is the same as L is not provided. By allowing the
function to determine if it was passed 2 or 3 args, this can be handled
properly.
So the question is, can/should I change opr_sanity to allow this case?
I also still owe some additions to the strings regression test to make
it cover toasted values.
Other than those two issues, I think the patch is ready to go. I'm
planning to take on the replace function next.
Thanks,
Joe
Attachment | Content-Type | Size |
---|---|---|
substr.2002.08.13.3.patch | text/plain | 15.7 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2002-08-14 06:23:42 | Re: [HACKERS] [GENERAL] workaround for lack of REPLACE() function |
Previous Message | Patrick Nelson | 2002-08-14 06:03:44 | Re: Blob stuff |
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2002-08-14 06:23:42 | Re: [HACKERS] [GENERAL] workaround for lack of REPLACE() function |
Previous Message | Hannu Krosing | 2002-08-14 06:11:59 | Re: Open 7.3 items |
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2002-08-14 06:23:42 | Re: [HACKERS] [GENERAL] workaround for lack of REPLACE() function |
Previous Message | Bruce Momjian | 2002-08-14 06:05:08 | Re: [HACKERS] CREATE TEMP TABLE .... ON COMMIT |