Quick Links

Re: Very slow "bloat query"

From:	Gilles Darold <gilles(at)darold(dot)net>
To:	pgsql-performance(at)lists(dot)postgresql(dot)org
Subject:	Re: Very slow "bloat query"
Date:	2021-05-14 11:50:54
Message-ID:	804ea9ef-e9d4-e104-a9f1-d9a94e36013c@darold.net
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-performance

Le 14/05/2021 à 13:06, Marcin Gozdalik a écrit :
> Hi
>
> I am trying to use `pgmetrics` on a big (10TB+), busy (1GB/s RW)
> database. It takes around 5 minutes for pgmetrics to run. I traced the
> problem to the "bloat query" (version of
> https://wiki.postgresql.org/wiki/Show_database_bloat
> <https://wiki.postgresql.org/wiki/Show_database_bloat>) spinning in
> CPU, doing no I/O.
>
> I have traced the problem to the bloated `pg_class` (the irony:
> `pgmetrics` does not collect bloat on `pg_catalog`):
> `vacuum (full, analyze, verbose) pg_class;`
> ```
> INFO: vacuuming "pg_catalog.pg_class"
> INFO: "pg_class": found 1 removable, 7430805 nonremovable row
> versions in 158870 pages
> DETAIL: 7429943 dead row versions cannot be removed yet.
> CPU 1.36s/6.40u sec elapsed 9.85 sec.
> INFO: analyzing "pg_catalog.pg_class"
> INFO: "pg_class": scanned 60000 of 158869 pages, containing 295 live
> rows and 2806547 dead rows; 295 rows in sample, 781 estimated total rows
> VACUUM
> ```
>
> `pg_class` has so many dead rows because the workload is temp-table
> heavy (creating/destroying 1M+ temporary tables per day) and has long
> running analytics queries running for 24h+.
>
> PG query planner assumes that index scan on `pg_class` will be very
> quick and plans Nested loop with Index scan. However, the index scan
> has 7M dead tuples to filter out and the query takes more than 200
> seconds (https://explain.depesz.com/s/bw2G
> <https://explain.depesz.com/s/bw2G>).
>
> If I create a temp table from `pg_class` to contain only the live tuples:
> ```
> CREATE TEMPORARY TABLE pg_class_alive AS SELECT oid,* from pg_class;
> CREATE UNIQUE INDEX pg_class_alive_oid_index ON pg_class_alive(oid);
> CREATE UNIQUE INDEX pg_class_alive_relname_nsp_index ON
> pg_class_alive(relname, relnamespace);
> CREATE INDEX pg_class_tblspc_relfilenode_index ON
> pg_class_alive(reltablespace, relfilenode);
> ANALYZE pg_class_alive;
> ```
>
> and run the bloat query on `pg_class_alive` instead of `pg_class`:
> ```
> SELECT
> nn.nspname AS schemaname,
> cc.relname AS tablename,
> COALESCE(cc.reltuples,0) AS reltuples,
> COALESCE(cc.relpages,0) AS relpages,
> COALESCE(CEIL((cc.reltuples*((datahdr+8-
> (CASE WHEN datahdr%8=0 THEN 8 ELSE datahdr%8
> END))+nullhdr2+4))/(8192-20::float)),0) AS otta
> FROM
> pg_class_alive cc
> JOIN pg_namespace nn ON cc.relnamespace = nn.oid AND nn.nspname <>
> 'information_schema'
> LEFT JOIN
> (
> SELECT
> foo.nspname,foo.relname,
> (datawidth+32)::numeric AS datahdr,
> (maxfracsum*(nullhdr+8-(case when nullhdr%8=0 THEN 8 ELSE
> nullhdr%8 END))) AS nullhdr2
> FROM (
> SELECT
> ns.nspname, tbl.relname,
> SUM((1-coalesce(null_frac,0))*coalesce(avg_width, 2048)) AS
> datawidth,
> MAX(coalesce(null_frac,0)) AS maxfracsum,
> 23+(
> SELECT 1+count(*)/8
> FROM pg_stats s2
> WHERE null_frac<>0 AND s2.schemaname = ns.nspname AND
> s2.tablename = tbl.relname
> ) AS nullhdr
> FROM pg_attribute att
> JOIN pg_class_alive tbl ON att.attrelid = tbl.oid
> JOIN pg_namespace ns ON ns.oid = tbl.relnamespace
> LEFT JOIN pg_stats s ON s.schemaname=ns.nspname
> AND s.tablename = tbl.relname
> AND s.inherited=false
> AND s.attname=att.attname
> WHERE att.attnum > 0 AND tbl.relkind='r'
> GROUP BY 1,2
> ) AS foo
> ) AS rs
> ON cc.relname = rs.relname AND nn.nspname = rs.nspname
> LEFT JOIN pg_index i ON indrelid = cc.oid
> LEFT JOIN pg_class_alive c2 ON c2.oid = i.indexrelid
> ```
>
> it runs in 10s, 20x faster (https://explain.depesz.com/s/K4SH
> <https://explain.depesz.com/s/K4SH>)
>
> The rabbit hole probably goes deeper (e.g. should do the same for
> pg_statistic and pg_attribute and create a new pg_stats view).
>
> I am not able (at least not quickly) change the amount of temporary
> tables created or make the analytics queries finish quicker. Apart
> from the above hack of filtering out live tuples to a separate table
> is there anything I could do?

Hi,

To avoid bloating your catalog with temporary tables you can try using
https://github.com/darold/pgtt-rsl I don't know if it will fit the
performances but at least you will not bloat the catalog anymore.

About your hack, I don't see other solution except running vacuum on the
catalog tables more often, but I guess that this is already done or not
possible. But not bloating the catalog at such level is the right solution.

--
Gilles Darold
http://www.darold.net/

In response to

Very slow "bloat query" at 2021-05-14 11:06:33 from Marcin Gozdalik

Browse pgsql-performance by date

	From	Date	Subject
Next Message	Imre Samu	2021-05-14 12:07:49	Re: Very slow "bloat query"
Previous Message	Marcin Gozdalik	2021-05-14 11:06:33	Very slow "bloat query"