Re: CREATE/REFRESH MATERIALIZED VIEW planner difference?

From: Vijaykumar Jain <vijaykumarjain(dot)github(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Philip Semanchuk <philip(at)americanefficient(dot)com>, PostgreSQL General <pgsql-general(at)lists(dot)postgresql(dot)org>
Subject: Re: CREATE/REFRESH MATERIALIZED VIEW planner difference?
Date: 2021-06-01 19:15:25
Message-ID: CAM+6J97mxM5zPNuXxU_Jnqiyc5znRF3GTXd6dB_yfvpYWouuEQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

ok i see this.
i may be wrong, but even when i force parallel cost to 0,
i only get workers to create mv, but refresh mv plan does not use workers
for the same conf params.

*******************
postgres=# create table if not exists t( id int primary key, value int );
CREATE TABLE
postgres=# insert into t select x,x from generate_series(1, 100000) x;
INSERT 0 100000
postgres=# analyze t;
ANALYZE
*************

postgres=# drop materialized view mv;
DROP MATERIALIZED VIEW
postgres=# explain analyze create materialized view mv AS select
round(avg(id)), sum(id) from t, pg_sleep(10);
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=2943.02..2943.03 rows=1 width=40) (actual
time=10027.940..10027.941 rows=1 loops=1)
-> Nested Loop (cost=0.00..2443.01 rows=100000 width=4) (actual
time=10010.513..10022.985 rows=100000 loops=1)
-> Function Scan on pg_sleep (cost=0.00..0.01 rows=1 width=0)
(actual time=10010.497..10010.498 rows=1 loops=1)
-> Seq Scan on t (cost=0.00..1443.00 rows=100000 width=4)
(actual time=0.012..5.841 rows=100000 loops=1)
Planning Time: 0.245 ms
Execution Time: 10039.621 ms
(6 rows)

postgres=# drop materialized view mv;
DROP MATERIALIZED VIEW
postgres=# set parallel_setup_cost=0;
SET
postgres=# set parallel_tuple_cost=0;
SET
postgres=# set min_parallel_table_scan_size=0;
SET
postgres=# set max_parallel_workers_per_gather=4;
SET
postgres=# explain analyze create materialized view mv AS select
round(avg(id)), sum(id) from t, pg_sleep(10);
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------
Finalize Aggregate (cost=1318.04..1318.05 rows=1 width=40) (actual
time=10042.197..10042.457 rows=1 loops=1)
-> Gather (cost=1318.00..1318.01 rows=4 width=40) (actual
time=10041.941..10042.450 rows=5 loops=1)
Workers Planned: 4
Workers Launched: 4
-> Partial Aggregate (cost=1318.00..1318.01 rows=1 width=40)
(actual time=10035.167..10035.168 rows=1 loops=5)
-> Nested Loop (cost=0.00..1193.00 rows=25000 width=4)
(actual time=10011.980..10033.456 rows=20000 loops=5)
-> Parallel Seq Scan on t (cost=0.00..693.00
rows=25000 width=4) (actual time=0.005..5.791 rows=20000 loops=5)
-> Function Scan on pg_sleep (cost=0.00..0.01 rows=1
width=0) (actual time=0.501..0.501 rows=1 loops=100000)
Planning Time: 0.105 ms
Execution Time: 10059.992 ms
(10 rows)

postgres=# refresh materialized view mv;
REFRESH MATERIALIZED VIEW

*************************************************** auto explain in logs

2021-06-02 00:41:44.294 IST [2687] LOG: statement: explain analyze create
materialized view mv AS select round(avg(id)), sum(id) from t, pg_sleep(10);
2021-06-02 00:41:54.361 IST [2687] LOG: duration: 10059.566 ms plan:
Query Text: explain analyze create materialized view mv AS select
round(avg(id)), sum(id) from t, pg_sleep(10);
Finalize Aggregate (cost=1318.04..1318.05 rows=1 width=40) (actual
time=10042.197..10042.457 rows=1 loops=1)
Output: round(avg(t.id), 0), sum(t.id)
Buffers: shared hit=443
-> Gather (cost=1318.00..1318.01 rows=4 width=40) (actual
time=10041.941..10042.450 rows=5 loops=1)
Output: (PARTIAL avg(t.id)), (PARTIAL sum(t.id))
Workers Planned: 4
Workers Launched: 4
Buffers: shared hit=443
-> Partial Aggregate (cost=1318.00..1318.01 rows=1
width=40) (actual time=10035.167..10035.168 rows=1 loops=5)
Output: PARTIAL avg(t.id), PARTIAL sum(t.id)
Buffers: shared hit=443
Worker 0: actual time=10033.316..10033.316 rows=1
loops=1
Buffers: shared hit=62
Worker 1: actual time=10033.162..10033.163 rows=1
loops=1
Buffers: shared hit=55
Worker 2: actual time=10034.946..10034.946 rows=1
loops=1
Buffers: shared hit=117
Worker 3: actual time=10033.210..10033.211 rows=1
loops=1
Buffers: shared hit=103
-> Nested Loop (cost=0.00..1193.00 rows=25000
width=4) (actual time=10011.980..10033.456 rows=20000 loops=5)
Output: t.id
Buffers: shared hit=443
Worker 0: actual time=10017.958..10032.681
rows=14012 loops=1
Buffers: shared hit=62
Worker 1: actual time=10014.150..10032.520
rows=12430 loops=1
Buffers: shared hit=55
Worker 2: actual time=10007.133..10029.864
rows=26442 loops=1
Buffers: shared hit=117
Worker 3: actual time=10010.339..10032.137
rows=23160 loops=1
Buffers: shared hit=103
-> Parallel Seq Scan on public.t
(cost=0.00..693.00 rows=25000 width=4) (actual time=0.005..5.791 rows=20000
loops=5)
Output: t.id, t.value
Buffers: shared hit=443
Worker 0: actual time=0.004..0.708
rows=14012 loops=1
Buffers: shared hit=62
Worker 1: actual time=0.005..0.722
rows=12430 loops=1
Buffers: shared hit=55
Worker 2: actual time=0.006..1.433
rows=26442 loops=1
Buffers: shared hit=117
Worker 3: actual time=0.005..17.246
rows=23160 loops=1
Buffers: shared hit=103
-> Function Scan on pg_catalog.pg_sleep
(cost=0.00..0.01 rows=1 width=0) (actual time=0.501..0.501 rows=1
loops=100000)
Output: pg_sleep.pg_sleep
Function Call: pg_sleep('10'::double
precision)
Worker 0: actual time=0.715..0.715
rows=1 loops=14012
Worker 1: actual time=0.806..0.807
rows=1 loops=12430
Worker 2: actual time=0.378..0.379
rows=1 loops=26442
Worker 3: actual time=0.432..0.432
rows=1 loops=23160
2021-06-02 00:41:54.369 IST [2687] LOG: duration: 10074.336 ms
*2021-06-02 00:42:00.567 IST [2687] LOG: statement: refresh materialized
view mv;*
*2021-06-02 00:42:10.611 IST [2687] LOG: duration: 10023.402 ms plan:*
* Query Text: refresh materialized view mv;*
* Aggregate (cost=2943.02..2943.03 rows=1 width=40) (actual
time=10023.331..10023.332 rows=1 loops=1)*
* Output: round(avg(t.id <http://t.id>), 0), sum(t.id
<http://t.id>)*
* Buffers: shared hit=443*
* -> Nested Loop (cost=0.00..2443.01 rows=100000 width=4)
(actual time=10005.544..10018.127 rows=100000 loops=1)*
* Output: t.id <http://t.id>*
* Buffers: shared hit=443*
* -> Function Scan on pg_catalog.pg_sleep (cost=0.00..0.01
rows=1 width=0) (actual time=10005.504..10005.505 rows=1 loops=1)*
* Output: pg_sleep.pg_sleep*
* Function Call: pg_sleep('10'::double precision)*
* -> Seq Scan on public.t (cost=0.00..1443.00 rows=100000
width=4) (actual time=0.032..5.815 rows=100000 loops=1)*
* Output: t.id <http://t.id>, t.value*
* Buffers: shared hit=443*
*2021-06-02 00:42:10.619 IST [2687] LOG: duration: 10051.366 ms*

On Wed, 2 Jun 2021 at 00:21, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> Philip Semanchuk <philip(at)americanefficient(dot)com> writes:
> > I can confirm that it’s not waiting on a lock. In addition, through the
> AWS CPU utilization monitor I can see that the REFRESH uses one CPU/worker
> whereas the CREATE uses four. This is consistent with the EXPLAIN ANALYZE
> for the CREATE which says it uses four workers.
>
> Hm. I tried to reproduce this here, and in a simple test case I get
> parallelized plans for both CREATE and REFRESH. Are you sure the
> REFRESH is running with the same server parameter settings?
>
> >> also, can you share the plans where you see the diff.
>
> > Unless I misunderstand, there is no plan for a REFRESH.
>
> EXPLAIN isn't bright about that, but if you enable auto_explain,
> it will log the plan for a REFRESH's query.
>
> regards, tom lane
>

--
Thanks,
Vijay
Mumbai, India

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Thomas Munro 2021-06-01 19:23:11 Re: CREATE/REFRESH MATERIALIZED VIEW planner difference?
Previous Message Tom Lane 2021-06-01 18:51:05 Re: CREATE/REFRESH MATERIALIZED VIEW planner difference?