Re: Refresh Materialized View Issue

From: Ron Johnson <ronljohnsonjr(at)gmail(dot)com>
To: Jeremiah Bauer <jbauer(at)agristats(dot)com>
Cc: "pgsql-general(at)lists(dot)postgresql(dot)org" <pgsql-general(at)lists(dot)postgresql(dot)org>
Subject: Re: Refresh Materialized View Issue
Date: 2024-01-11 20:37:50
Message-ID: CANzqJaDY=0fuVrNveyZJpf+-Z=HhnisqDU31N1d4+bYAPjGTGQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Thu, Jan 11, 2024 at 3:31 PM Jeremiah Bauer <jbauer(at)agristats(dot)com> wrote:

> Hello all,
>
> We are having an issue with a materialized view refresh never finishing,
> any help is appreciated. It will run at 100% CPU and no IO traffic
> indefinitely after about 15 minutes of parallel workers and the parent
> worker consuming CPU and IO.
>
> PostgreSQL Version: 15.5
>
> Due to some design decisions, we have a large table that we need a
> distinct list of associated id's out of for an application interface. This
> table has approximately 1,650,000,000 rows with a table size of 175GB and
> 250GB of indexes.
>
> I anonymized the column names in the table structure, but this is the
> table structure of our large table.
>
> create table public.large_table(
> large_table_id bigint NOT NULL DEFAULT
> nextval(public.large_table_id_seq'::regclass),
> nmrc numeric(30,5),
> id1 bigint NOT NULL,
> id2 bigint NOT NULL,
> id3 bigint NOT NULL,
> id4 bigint NOT NULL,
> id5 bigint NOT NULL,
> last_updt_ts timestamp without time zone,
> last_db_updt_ts timestamp without time zone,
> charval character varying(30)
>
> )
>
> The materialized view is defined as:
>
> create materialized view public.vw_distinct_list_of_ids as (
>
> select distinct id1, id2, id3 from public.large_table
>
> ) with no data;
>
> When refreshed the materialized view will contain approximately 59,000,000
> rows.
>
> It takes approximately 12 minutes to run the select statement from the
> view definition. When you refresh the materialized view for the first
> time, it will spawn several parallel workers that run at 100% CPU and
> produce some I/O for about 12 or 13 minutes, then they finish their work
> and terminate. The parent worker will continue to run at 100% CPU until I
> terminate the process. I've let it run for over 24 hours.
>
> What I find interesting is that if I do this:
>
> create table public.table_of_distinct_list_of_ids as (
>
> select distinct id1, id2, id3 from public.large_table
>
> );
>
> It will complete in 12 or 13 minutes, the same as the select statement. I
> have three questions.
>

My question is: what indexes are on public.large_table? Hopefully there's
a compound b-tree index on id1, id2, id3.

>
> 1. Why is the materialized view refresh stalling and never completing?
> 2. Is there a setting I am missing that affects materialized view
> refreshes?
> 3. Should I just go with a table that is dropped and recreated
> instead? I was hoping to avoid this.
>
> A materialized view isn't too different from an unlogged table.

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Jeremiah Bauer 2024-01-11 20:50:13 Re: Refresh Materialized View Issue
Previous Message Jeremiah Bauer 2024-01-11 20:31:01 Refresh Materialized View Issue