From: | Jeff Davis <pgsql(at)j-davis(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Allow multi-byte characters as escape in SIMILAR TO and SUBSTRING |
Date: | 2014-07-12 02:16:17 |
Message-ID: | 1405131377.9081.244.camel@jeff-desktop |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Fri, 2014-07-11 at 11:51 -0400, Tom Lane wrote:
> Jeff Davis <pgsql(at)j-davis(dot)com> writes:
> > Attached is a small patch to $SUBJECT.
> > In master, only single-byte characters are allowed as an escape. Of
> > course, with the patch it must still be a single character, but it may
> > be multi-byte.
>
> I'm concerned about the performance cost of this patch. Have you done
> any measurements about what kind of overhead you are putting on the
> inner loop of similar_escape?
I didn't consider this very performance critical, because this is
looping through the pattern, which I wouldn't expect to be a long
string. On my machine using en_US.UTF-8, the difference is imperceptible
for a SIMILAR TO ... ESCAPE query.
I was able to see about a 2% increase in runtime when using the
similar_escape function directly. I made a 10M tuple table and did:
explain analyze
select
similar_escape('ΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣ','#') from t;
which was the worst reasonable case I could think of. (It appears that
selecting from a table is faster than from generate_series. I'm curious
what you use when testing the performance of an individual function at
the SQL level.)
> At the very least, please don't call GetDatabaseEncoding() over again
> every single time through the inner loop. More generally, why do we
> need to use pg_encoding_verifymb? The input data is presumably validly
> encoded. ISTM the significantly cheaper pg_mblen() would be more
> appropriate.
Thank you. Using the non-verifying variants reduces the penalty in the
above test to 1%.
If needed, we could optimize further through code specialization, as
like_escape() does. Though I think like_escape() is specialized just
because MatchText() is specialized; and MatchText is specialized because
it operates on the actual data, not just the pattern. So I don't see a
reason to specialize similar_escape().
Regards,
Jeff Davis
Attachment | Content-Type | Size |
---|---|---|
similar-escape.patch | text/x-patch | 3.6 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Noah Misch | 2014-07-12 03:43:50 | Re: Crash on backend exit w/ OpenLDAP [2.4.24, 2.4.31] |
Previous Message | Robert Haas | 2014-07-11 21:07:38 | Re: proposal: rounding up time value less than its unit. |