Quick Links

Very slow "bloat query"

From:	Marcin Gozdalik <gozdal(at)gmail(dot)com>
To:	pgsql-performance(at)postgresql(dot)org
Subject:	Very slow "bloat query"
Date:	2021-05-14 11:06:33
Message-ID:	CADu1mRN1PPXO0p2HUF4NkrLSLmdiXXcafm5B6r2kk7Tawd1UoQ@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-performance

I am trying to use `pgmetrics` on a big (10TB+), busy (1GB/s RW) database.
It takes around 5 minutes for pgmetrics to run. I traced the problem to the
"bloat query" (version of
https://wiki.postgresql.org/wiki/Show_database_bloat) spinning in CPU,
doing no I/O.

I have traced the problem to the bloated `pg_class` (the irony: `pgmetrics`
does not collect bloat on `pg_catalog`):
`vacuum (full, analyze, verbose) pg_class;`
```
INFO: vacuuming "pg_catalog.pg_class"
INFO: "pg_class": found 1 removable, 7430805 nonremovable row versions in
158870 pages
DETAIL: 7429943 dead row versions cannot be removed yet.
CPU 1.36s/6.40u sec elapsed 9.85 sec.
INFO: analyzing "pg_catalog.pg_class"
INFO: "pg_class": scanned 60000 of 158869 pages, containing 295 live rows
and 2806547 dead rows; 295 rows in sample, 781 estimated total rows
VACUUM
```

`pg_class` has so many dead rows because the workload is temp-table heavy
(creating/destroying 1M+ temporary tables per day) and has long running
analytics queries running for 24h+.

PG query planner assumes that index scan on `pg_class` will be very quick
and plans Nested loop with Index scan. However, the index scan has 7M dead
tuples to filter out and the query takes more than 200 seconds (
https://explain.depesz.com/s/bw2G)

If I create a temp table from `pg_class` to contain only the live tuples:
```
CREATE TEMPORARY TABLE pg_class_alive AS SELECT oid,* from pg_class;
CREATE UNIQUE INDEX pg_class_alive_oid_index ON pg_class_alive(oid);
CREATE UNIQUE INDEX pg_class_alive_relname_nsp_index ON
pg_class_alive(relname, relnamespace);
CREATE INDEX pg_class_tblspc_relfilenode_index ON
pg_class_alive(reltablespace, relfilenode);
ANALYZE pg_class_alive;
```

and run the bloat query on `pg_class_alive` instead of `pg_class`:
```
SELECT
nn.nspname AS schemaname,
cc.relname AS tablename,
COALESCE(cc.reltuples,0) AS reltuples,
COALESCE(cc.relpages,0) AS relpages,
COALESCE(CEIL((cc.reltuples*((datahdr+8-
(CASE WHEN datahdr%8=0 THEN 8 ELSE datahdr%8
END))+nullhdr2+4))/(8192-20::float)),0) AS otta
FROM
pg_class_alive cc
JOIN pg_namespace nn ON cc.relnamespace = nn.oid AND nn.nspname <>
'information_schema'
LEFT JOIN
(
SELECT
foo.nspname,foo.relname,
(datawidth+32)::numeric AS datahdr,
(maxfracsum*(nullhdr+8-(case when nullhdr%8=0 THEN 8 ELSE nullhdr%8
END))) AS nullhdr2
FROM (
SELECT
ns.nspname, tbl.relname,
SUM((1-coalesce(null_frac,0))*coalesce(avg_width, 2048)) AS
datawidth,
MAX(coalesce(null_frac,0)) AS maxfracsum,
23+(
SELECT 1+count(*)/8
FROM pg_stats s2
WHERE null_frac<>0 AND s2.schemaname = ns.nspname AND s2.tablename
= tbl.relname
) AS nullhdr
FROM pg_attribute att
JOIN pg_class_alive tbl ON att.attrelid = tbl.oid
JOIN pg_namespace ns ON ns.oid = tbl.relnamespace
LEFT JOIN pg_stats s ON s.schemaname=ns.nspname
AND s.tablename = tbl.relname
AND s.inherited=false
AND s.attname=att.attname
WHERE att.attnum > 0 AND tbl.relkind='r'
GROUP BY 1,2
) AS foo
) AS rs
ON cc.relname = rs.relname AND nn.nspname = rs.nspname
LEFT JOIN pg_index i ON indrelid = cc.oid
LEFT JOIN pg_class_alive c2 ON c2.oid = i.indexrelid
```

it runs in 10s, 20x faster (https://explain.depesz.com/s/K4SH)

The rabbit hole probably goes deeper (e.g. should do the same for
pg_statistic and pg_attribute and create a new pg_stats view).

I am not able (at least not quickly) change the amount of temporary tables
created or make the analytics queries finish quicker. Apart from the above
hack of filtering out live tuples to a separate table is there anything I
could do?

Thank you,
Marcin Gozdalik

--
Marcin Gozdalik

Responses

Re: Very slow "bloat query" at 2021-05-14 11:50:54 from Gilles Darold
Re: Very slow "bloat query" at 2021-05-14 12:07:49 from Imre Samu
Re: Very slow "bloat query" at 2021-05-14 15:04:02 from Tom Lane

Browse pgsql-performance by date

	From	Date	Subject
Next Message	Gilles Darold	2021-05-14 11:50:54	Re: Very slow "bloat query"
Previous Message	Manoj Kumar	2021-05-14 06:51:36	RE: Re: PostgreSQL blocked locks query