Re: Matview size - space increased on concurrently refresh

From: Nicola Contu <nicola(dot)contu(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Kaixi Luo <kaixiluo(at)gmail(dot)com>, "pgsql-generallists(dot)postgresql(dot)org" <pgsql-general(at)lists(dot)postgresql(dot)org>, Alessandro Aste <Alessandro(dot)aste(at)gtt(dot)net>
Subject: Re: Matview size - space increased on concurrently refresh
Date: 2019-07-17 10:31:43
Message-ID: CAMTZZh30qLAQx_qEjep9ozXYgJDFfn=ukj9cOUScr-DoFjJ+Tw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Il giorno dom 14 lug 2019 alle ore 22:23 Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> ha
scritto:

> [ please do not top-post in your replies, it makes the conversation hard
> to follow ]
>
> Nicola Contu <nicola(dot)contu(at)gmail(dot)com> writes:
> > Il dom 14 lug 2019, 21:34 Kaixi Luo <kaixiluo(at)gmail(dot)com> ha scritto:
> >> This is normal and something to be expected. When refreshing the
> >> materialized view, the new data is written to a disk and then the two
> >> tables are diffed. After the refresh finishes, your view size should go
> >> back to normal.
>
> > It does not. That's the issue.
> > It always increases of 120mb and it reached 12gb instead of just 180mb.
>
> A concurrent matview refresh will necessarily leave behind two copies
> of any rows it changes, just like any other row-update operation in
> Postgres. Once there are no concurrent transactions that can "see"
> the old row copies, they should be reclaimable by vacuum.
>
> Since you're not seeing autovacuum reclaim the space automatically,
> I hypothesize that you've got autovacuum turned off or dialed down
> to unrealistically non-aggressive settings. Or possibly you have
> old open transactions that are preventing reclaiming dead rows
> (because they can still possibly "see" those rows). Either of those
> explanations should imply that you're getting similar bloat in every
> other table and matview, though.
>
> You might want to look into pg_stat_all_tables to see what it says
> about the last_autovacuum time etc. for that matview. Another source
> of insight is to do a manual "vacuum verbose" on the matview and see
> what that says about removable and nonremovable rows.
>
> regards, tom lane
>

This matview has nothing strange and nothign custom.
We can replicate the matview that is not used by anyone.

cmdv3=# vacuum (full,analyze,verbose) public.matview_nm_connections;
INFO: vacuuming "public.matview_nm_connections"
INFO: "matview_nm_connections": found 0 removable, 295877 nonremovable row
versions in 33654 pages
DETAIL: 0 dead row versions cannot be removed yet.
CPU: user: 0.45 s, system: 0.43 s, elapsed: 1.24 s.
INFO: analyzing "public.matview_nm_connections"
INFO: "matview_nm_connections": scanned 16986 of 16986 pages, containing
295877 live rows and 0 dead rows; 30000 rows in sample, 295877 estimated
total rows
VACUUM

This is an example of full and verbose vacuum. Everytime I refresh it I get
the size increased.

See stats from the pg_stat_all_tables :

cmdv3=# select * from pg_stat_all_tables where relname =
'matview_nm_connections';
relid | schemaname | relname | seq_scan | seq_tup_read
| idx_scan | idx_tup_fetch | n_tup_ins | n_tup_upd | n_tup_del |
n_tup_hot_upd | n_live_tup | n_dead_tup | n_mod_since_analyze |
last_vacuum | last_autovacuum |
last_analyze | last_autoanalyze
| vacuum_count | autovacuum_count | analyze_count | autoanalyze_count
------------+------------+------------------------+----------+--------------+----------+---------------+-----------+-----------+-----------+---------------+------------+------------+---------------------+-------------------------------+-------------------------------+-------------------------------+-------------------------
------+--------------+------------------+---------------+-------------------
3466831733 | public | matview_nm_connections | 3725 | 540992219
| 33235 | 255113 | 96874161 | 0 | 95692276 |
0 | 295877 | 0 | 0 | 2019-07-12
11:58:39.198049+00 | 2019-07-16 11:07:02.765612+00 | 2019-07-17
10:28:08.819679+00 | 2019-07-16 11:03:32.4895
73+00 | 5 | 29 | 11 | 17
(1 row)

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Weatherby,Gerard 2019-07-17 10:42:04 Re: How to run a task continuously in the background
Previous Message Dirk Mika 2019-07-17 09:57:57 Re: How to run a task continuously in the background