Functions as conditions

From: Bao Ton <oabnot(at)gmail(dot)com>
To: pgsql-novice(at)postgresql(dot)org
Subject: Functions as conditions
Date: 2016-04-21 19:24:09
Message-ID: CAPCuAgyDO0bPYyKwX-=qjqmQMuB51p3VPCC826_6n6oK52SkZw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-novice

Hello all,

I'd like some verification on how functions as conditions work and perform
on a table, specifically a function that would take a specific column as
input, and output a boolean value.

psql --version
psql (PostgreSQL) 9.5.0

Suppose I have a table with two columns, id and an md5 with 700000 rows:
CREATE TABLE t_random AS SELECT s, md5(random()::text) FROM
generate_Series(1,700000) s;

Let's create a function that checks if column s is over 500:

CREATE OR REPLACE FUNCTION test_func(integer)
RETURNS boolean
LANGUAGE plpgsql
AS $function$
BEGIN
IF $1 > 500 THEN
return true;
ELSE
return false;
END IF;
END$function$;

Running:
EXPLAIN ANALYZE SELECT * FROM t_random WHERE *test_func(s)*
QUERY
PLAN
--------------------------------------------------------------------------------------------------------------------
Seq Scan on t_random (cost=0.00..187834.00 rows=*233333* width=37)
(actual time=0.505..572.463 rows=*699500* loops=1)
Filter: test_func(s)
Rows Removed by Filter: 500
Planning time: 0.025 ms
Execution time: 600.917 ms

As opposed to:
EXPLAIN ANALYZE SELECT * FROM t_random WHERE *s > 500;*
QUERY
PLAN
------------------------------------------------------------------------------------------------------------------
Seq Scan on t_random (cost=0.00..14584.00 rows=*699546* width=37) (actual
time=0.080..75.001 rows=*699500* loops=1)
Filter: (s > 500)
Rows Removed by Filter: 500
Planning time: 0.029 ms
Execution time: 98.257 ms

The first query plan has a horrendous query time and estimations using
test_func(). I'm assuming because the function has 0 statistics on the
table itself. It can't base its' input off any index, meaning it has no
sense of how big the table is correct? I'm not exactly sure the reason here.

Let's assume that this function is absolutely necessary and that it changes
based on some environment variables, while maintaining only a single table.
We can not have a static column with fixed boolean values since this column
will always be changing depending on the environment. Would a good solution
then be to use views and create a column based on the function?

CREATE VIEW t_view AS SELECT *, test_func(s) AS passed FROM t_random;

Running:
EXPLAIN ANALYZE SELECT * FROM t_view WHERE passed = true:
QUERY
PLAN
--------------------------------------------------------------------------------------------------------------------------
Subquery Scan on t_view (cost=0.00..194834.00 rows=*350000* width=38)
(actual time=2.198..661.008 rows=*699500* loops=1)
Filter: t_view.test_func
Rows Removed by Filter: 500
-> Seq Scan on t_random (cost=0.00..187834.00 rows=*700000* width=37)
(actual time=0.026..579.125 rows=*700000* loops=1)
Planning time: 0.040 ms
Execution time: 690.057 mss

The query plan does a bit better but is this the best plan I can hope to
achieve with the specific function? Materialized views are not an option
since the environment is changing constantly and I would like an
always-up-to-date database.

Thank you,
-Bao

Responses

Browse pgsql-novice by date

  From Date Subject
Next Message Markus Metz 2016-04-21 19:25:02 postgres connection issues
Previous Message Holme, Vicki 2016-04-20 10:36:13 PostgreSQL 9.5 Service status