RE: Big Performance drop of Exceptions in UDFs between V11.2 and 13.4

From: "ldh(at)laurent-hasson(dot)com" <ldh(at)laurent-hasson(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Ranier Vilela <ranier(dot)vf(at)gmail(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: RE: Big Performance drop of Exceptions in UDFs between V11.2 and 13.4
Date: 2021-08-30 00:44:22
Message-ID: MN2PR15MB2560B122A49C4C06483183D785CB9@MN2PR15MB2560.namprd15.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

> -----Original Message-----
> From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
> Sent: Sunday, August 29, 2021 12:19
> To: ldh(at)laurent-hasson(dot)com
> Cc: Ranier Vilela <ranier(dot)vf(at)gmail(dot)com>; Andrew Dunstan
> <andrew(at)dunslane(dot)net>; Justin Pryzby <pryzby(at)telsasoft(dot)com>; pgsql-
> performance(at)postgresql(dot)org
> Subject: Re: Big Performance drop of Exceptions in UDFs between V11.2
> and 13.4
>
> "ldh(at)laurent-hasson(dot)com" <ldh(at)laurent-hasson(dot)com> writes:
> > Is it possible that the client I am using or the way I am creating the test
> database might affect this scenario? I use DBeaver and use the default
> settings to create the database:
> > - default encoding: UTF8
> > - collate: English_United States.1252
> > - ctype: English_United States.1252
>
> Yeah, I was thinking of quizzing you about that. I wonder whether
> something is thinking it needs to transcode to WIN1252 encoding and
> then back to UTF8, based on the .1252 property of the LC_XXX settings.
> That shouldn't account for any 500X factor either, but we're kind of
> grasping at straws here.
>
> Does Windows have any locale choices that imply UTF8 encoding
> exactly, and if so, do your results change when using that? Alternatively,
> try creating a database with WIN1252 encoding and those locale
> settings.
>
> regards, tom lane

Yeah, grasping at straws... and no material changes 😊 This is mystifying.

show lc_messages;
-- English_United States.1252

create table sampletest (a varchar, b varchar);
insert into sampletest (a, b)
select substr(md5(random()::text), 0, 15), (100000000*random())::integer::varchar
from generate_series(1,100000);

CREATE OR REPLACE FUNCTION toFloat(str varchar, val real)
RETURNS real AS $$
BEGIN
RETURN case when str is null then val else str::real end;
EXCEPTION WHEN OTHERS THEN
RETURN val;
END;
$$ LANGUAGE plpgsql COST 1 IMMUTABLE;

explain (analyze,buffers,COSTS,TIMING)
select MAX(toFloat(a, null)) as "a" from sampletest
--Aggregate (cost=1477.84..1477.85 rows=1 width=4) (actual time=89527.032..89527.033 rows=1 loops=1)
-- Buffers: shared hit=647
-- -> Seq Scan on sampletest (cost=0.00..1197.56 rows=56056 width=32) (actual time=0.024..37.811 rows=100000 loops=1)
-- Buffers: shared hit=637
--Planning:
-- Buffers: shared hit=24
--Planning Time: 0.347 ms
--Execution Time: 89527.501 ms

explain (analyze,buffers,COSTS,TIMING)
select MAX(toFloat(b, null)) as "b" from sampletest
--Aggregate (cost=2137.00..2137.01 rows=1 width=4) (actual time=186.605..186.606 rows=1 loops=1)
-- Buffers: shared hit=637
-- -> Seq Scan on sampletest (cost=0.00..1637.00 rows=100000 width=8) (actual time=0.008..9.679 rows=100000 loops=1)
-- Buffers: shared hit=637
--Planning:
-- Buffers: shared hit=4
--Planning Time: 0.339 ms
--Execution Time: 186.641 ms

At this point, I am not sure how to proceed except to rethink that toFloat() function and many other places where we use exceptions. We get such dirty data that I need a "safe" way to convert a string to float without throwing an exception. BTW, I tried other combinations in case there may have been some weird interactions with the ::REAL conversion operator, but nothing made any change. Could you recommend another approach off the top of your head? I could use regexes for testing etc... Or maybe there is another option like a no-throw conversion that's built in or in some extension that you may know of? Like the "SAFE." Prefix in BigQuery.

Thank you,
Laurent.

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Ranier Vilela 2021-08-30 01:55:53 Re: Big Performance drop of Exceptions in UDFs between V11.2 and 13.4
Previous Message ldh@laurent-hasson.com 2021-08-30 00:29:26 RE: Big Performance drop of Exceptions in UDFs between V11.2 and 13.4