Re: Strange behavior of function date_trunc

From: Pavel Luzanov <p(dot)luzanov(at)postgrespro(dot)ru>
To: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, pgsql-general(at)lists(dot)postgresql(dot)org
Subject: Re: Strange behavior of function date_trunc
Date: 2021-05-06 13:12:15
Message-ID: 1b32883d-4078-54d2-96ae-297178f8ea01@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hello,

On 05.05.2021 16:55, Tomas Vondra wrote:
> Well, it'd not like date_trunc is executed for each row while now() is
> executed only once. The functions are executed for each row in both
> cases...
A couple more experiments.
Since I can't to track usage of system functions, I decided to play with
user defined stable function.

In this case it is a wrapper for date_trunc:
CREATE OR REPLACE FUNCTION user_date_trunc(field text, source
timestamptz) RETURNS timestamptz
AS $$
BEGIN
    RETURN date_trunc(field, source);
END;
$$ LANGUAGE plpgsql STABLE;

SET track_functions = 'all';
SELECT pg_stat_reset_single_function_counters('user_date_trunc'::regproc);

Let's create a test table:
CREATE TABLE t AS
    SELECT g.x, g.x::text AS y
    FROM generate_series('2021-01-01', '2021-06-01', '1 s'::interval)
AS g(x) ORDER BY random();
VACUUM ANALYZE t;
SET random_page_cost = 1.1; -- for ssd disks

Previous query with user defined function:
EXPLAIN (ANALYZE, SETTINGS)
SELECT * FROM t
WHERE t.x >= user_date_trunc('year', '2021-05-06'::timestamptz);
                                                     QUERY PLAN
--------------------------------------------------------------------------------------------------------------------
 Seq Scan on t  (cost=0.00..3520631.00 rows=13045175 width=31) (actual
time=63.977..7157.852 rows=13046401 loops=1)
   Filter: (x >= user_date_trunc('year'::text, '2021-05-06
00:00:00+03'::timestamp with time zone))
 Settings: random_page_cost = '1.1'
 Planning Time: 0.373 ms
 JIT:
   Functions: 2
   Options: Inlining true, Optimization true, Expressions true,
Deforming true
   Timing: Generation 0.404 ms, Inlining 53.122 ms, Optimization 7.100
ms, Emission 3.600 ms, Total 64.227 ms
 Execution Time: 7502.564 ms

SELECT funcname, calls FROM pg_stat_user_functions WHERE funcid =
'user_date_trunc'::regproc;
    funcname     |  calls
-----------------+----------
 user_date_trunc | 13046402

The function is executed once for each row in the table +1. As you
explained earlier.
Ok.

But when there is an index on the column, the situation changes:
CREATE INDEX ON t(x);
ANALYZE t;
SELECT pg_stat_reset_single_function_counters('user_date_trunc'::regproc);

EXPLAIN (ANALYZE, SETTINGS)
SELECT * FROM t
WHERE t.x >= user_date_trunc('year', '2021-05-06'::timestamptz);
                                                            QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------
 Index Scan using t_x_idx on t  (cost=0.69..373188.48 rows=13046480
width=31) (actual time=3.163..26476.012 rows=13046401 loops=1)
   Index Cond: (x >= user_date_trunc('year'::text, '2021-05-06
00:00:00+03'::timestamp with time zone))
 Settings: random_page_cost = '1.1'
 Planning Time: 0.294 ms
 JIT:
   Functions: 3
   Options: Inlining false, Optimization false, Expressions true,
Deforming true
   Timing: Generation 0.551 ms, Inlining 0.000 ms, Optimization 0.380
ms, Emission 2.587 ms, Total 3.518 ms
 Execution Time: 26787.377 ms

SELECT funcname, calls FROM pg_stat_user_functions WHERE funcid =
'user_date_trunc'::regproc;
    funcname     | calls
-----------------+-------
 user_date_trunc |     2

Only two times!
Does having an index allow the function value to be cached?

By the way, an index is used to access the table, although seq scan are
much better.

Now with scalar subquery:
EXPLAIN (ANALYZE, SETTINGS)
SELECT * FROM t
WHERE t.x >= (SELECT user_date_trunc('year', '2021-05-06'::timestamptz));
                                                            QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------
 Index Scan using t_x_idx on t  (cost=0.70..194745.67 rows=4348827
width=31) (actual time=3.000..26543.395 rows=13046401 loops=1)
   Index Cond: (x >= $0)
   InitPlan 1 (returns $0)
     ->  Result  (cost=0.00..0.26 rows=1 width=8) (actual
time=0.030..0.030 rows=1 loops=1)
 Settings: random_page_cost = '1.1'
 Planning Time: 0.071 ms
 JIT:
   Functions: 4
   Options: Inlining false, Optimization false, Expressions true,
Deforming true
   Timing: Generation 0.671 ms, Inlining 0.000 ms, Optimization 0.188
ms, Emission 2.583 ms, Total 3.441 ms
 Execution Time: 26875.105 ms

SELECT funcname, calls FROM pg_stat_user_functions WHERE funcid =
'user_date_trunc'::regproc;
    funcname     | calls
-----------------+-------
 user_date_trunc |     3

The function is executed once, but the index is still in use.

The second question. What is the reason for choosing an index scan?

Seq scan with constant value only:
EXPLAIN (ANALYZE, SETTINGS)
SELECT * FROM t
WHERE t.x >= '2021-01-01'::timestamptz;
                                                   QUERY PLAN
-----------------------------------------------------------------------------------------------------------------
 Seq Scan on t  (cost=0.00..259011.00 rows=13046480 width=31) (actual
time=2.135..913.628 rows=13046401 loops=1)
   Filter: (x >= '2021-01-01 00:00:00+03'::timestamp with time zone)
 Settings: random_page_cost = '1.1'
 Planning Time: 0.140 ms
 JIT:
   Functions: 2
   Options: Inlining false, Optimization false, Expressions true,
Deforming true
   Timing: Generation 0.392 ms, Inlining 0.000 ms, Optimization 0.163
ms, Emission 1.837 ms, Total 2.391 ms
 Execution Time: 1195.985 ms

The statistics on t.x shows that the condition in the query is not
selective and seq scan are preferred over index scan.
SELECT n_distinct, (histogram_bounds::text::text[])[1]
FROM pg_stats WHERE tablename = 't' AND attname = 'x';
 n_distinct |    histogram_bounds
------------+------------------------
         -1 | 2021-01-01 00:01:10+03

--
Pavel Luzanov
Postgres Professional: https://postgrespro.com
The Russian Postgres Company

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Thomas Munro 2021-05-06 13:19:55 Re: Chain Hashing
Previous Message Vijaykumar Jain 2021-05-06 11:38:01 Re: trigger impacting insertion of records