Re: [EXTERNAL]Re: Refresh Materialized View Issue

From: Jim Nasby <jim(dot)nasby(at)gmail(dot)com>
To: Jeremiah Bauer <jbauer(at)agristats(dot)com>, "pgsql-general(at)lists(dot)postgresql(dot)org" <pgsql-general(at)lists(dot)postgresql(dot)org>
Subject: Re: [EXTERNAL]Re: Refresh Materialized View Issue
Date: 2024-01-12 19:11:05
Message-ID: f151126e-ffb8-4481-9b1f-8b43950b6b2f@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On 1/12/24 6:34 AM, Jeremiah Bauer wrote:
>
> That'd be a band-aid at best, because we know that the query used to
> define the materialized view runs in a reasonable amount of time on it's
> own, as does a CTAS. So either the REFRESH is doing something odd when
> writing into the new relation (which looking at the code seems very
> unlikely), or REFRESH is getting a different query plan for some reason.
> Unfortunately, I don't know of any easy way to get the query plan for
> the REFRESH (it might be possible via gdb, but I'm not sure). We do at
> least know that the REFRESH is using parallel workers.
>
> Can you post the output of EXPLAIN ANALYZE for the SELECT? That might
> provide some clues.
>
> Sure, here is the explain analyze for the select:
>
> HashAggregate  (cost=123986470.20..129486707.63 rows=164493082 width=24)
> (actual time=697250.385..741548.965 rows=59015171 loops=1)
>   Group Key: id1, id2, id3
>   Planned Partitions: 64  Batches: 65  Memory Usage: 328209kB  Disk
> Usage: 6750176kB
>   ->  Gather  (cost=64653301.50..107228737.47 rows=328986164 width=24)
> (actual time=357598.331..594226.355 rows=161151623 loops=1)
>         Workers Planned: 2
>         Workers Launched: 2
>         ->  HashAggregate  (cost=64652301.50..74329121.07
> rows=164493082 width=24) (actual time=357572.082..578038.457
> rows=53717208 loops=3)
>               Group Key: id1, id2, id3
>               Planned Partitions: 64  Batches: 65  Memory Usage:
> 328209kB  Disk Usage: 25774088kB
>               Worker 0:  Batches: 65  Memory Usage: 328209kB  Disk
> Usage: 25375784kB
>               Worker 1:  Batches: 65  Memory Usage: 328209kB  Disk
> Usage: 25382936kB
>               ->  Parallel Seq Scan on large_table
>  (cost=0.00..29740358.40 rows=685387840 width=24) (actual
> time=12.954..99596.289 rows=548310252 loops=3)
> Planning Time: 5.380 ms
> Execution Time: 745750.371 ms

Ok, so that's using a parallel query as well, just like REFRESH, so no
help there.

At this point I think you'd need to do some OS-level investigation using
a tool like perf to determine what the parent process is doing once the
workers finish.

BTW, I did just come across a script[1] that will print the query plan
for a running query. Based on what you've said I suspect it wouldn't
tell us much here, but I wanted to mention it.

1: https://github.com/StarfishStorage/explain-running-query
--
Jim Nasby, Data Architect, Austin TX

In response to

Browse pgsql-general by date

  From Date Subject
Next Message David Ventimiglia 2024-01-12 19:34:36 How to redirect output from PostgreSQL pg_recvlogical to a file or a pipe?
Previous Message Kiran K V 2024-01-12 17:47:27 Re: Issue with loading unicode characters with copy command