Re: Big Performance drop of Exceptions in UDFs between V11.2 and 13.4 (workarounds)

From: Justin Pryzby <pryzby(at)telsasoft(dot)com>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: ldh(at)laurent-hasson(dot)com, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Ranier Vilela <ranier(dot)vf(at)gmail(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Big Performance drop of Exceptions in UDFs between V11.2 and 13.4 (workarounds)
Date: 2021-08-30 03:16:48
Message-ID: 20210830031648.GU26465@telsasoft.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Mon, Aug 30, 2021 at 04:43:23AM +0200, Pavel Stehule wrote:
> po 30. 8. 2021 v 2:44 odesílatel ldh(at)laurent-hasson(dot)com napsal:
> > At this point, I am not sure how to proceed except to rethink that
> > toFloat() function and many other places where we use exceptions. We get
> > such dirty data that I need a "safe" way to convert a string to float
> > without throwing an exception. BTW, I tried other combinations in case
> > there may have been some weird interactions with the ::REAL conversion
> > operator, but nothing made any change. Could you recommend another approach
> > off the top of your head? I could use regexes for testing etc... Or maybe
> > there is another option like a no-throw conversion that's built in or in
> > some extension that you may know of? Like the "SAFE." Prefix in BigQuery.
>
> CREATE OR REPLACE FUNCTION safe_to_double_precision(t text)
> RETURNS double precision AS $$
> BEGIN
> IF $1 SIMILAR TO '[+-]?([0-9]*[.])?[0-9]+' THEN
> RETURN $1::double precision;
> ELSE
> RETURN NULL;
> END IF;
> END;
> $$ LANGUAGE plpgsql IMMUTABLE STRICT;

This tries to use a regex to determine if something is a "Number" or not.
Which has all the issues enumerated in painful detail by long answers on stack
overflow, and other wiki/blog/forums.

Rather than trying to define Numbers using regex, I'd try to avoid only the
most frequent exceptions and get 90% of the performance back. I don't know
what your data looks like, but you might try things like this:

IF $1 IS NULL THEN RETURN $2
ELSE IF $1 ~ '^$' THEN RETURN $2
ELSE IF $1 ~ '[[:alpha:]]{2}' THEN RETURN $2
ELSE IF $1 !~ '[[:digit:]]' THEN RETURN $2
BEGIN
RETURN $1::float;
EXCEPTION WHEN OTHERS THEN
RETURN $2;
END;

You can check the stackoverflow page for ideas as to what kind of thing to
reject, but it may depend mostly on your data (what is the most common string?
The most common exceptional string?).

I think it's possible that could even be *faster* than the original, since it
avoids the exception block for values which are for sure going to cause an
exception anyway. It might be that using alternation (|) is faster (if less
readable) than using a handful of IF branches.

--
Justin

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message ldh@laurent-hasson.com 2021-08-30 04:20:38 RE: Big Performance drop of Exceptions in UDFs between V11.2 and 13.4 (workarounds)
Previous Message Pavel Stehule 2021-08-30 02:43:23 Re: Big Performance drop of Exceptions in UDFs between V11.2 and 13.4