Prototype: Implement dead tuples xid histograms

From: Renan Alves Fonseca <renanfonseca(at)gmail(dot)com>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Prototype: Implement dead tuples xid histograms
Date: 2025-04-16 18:38:54
Message-ID: 87sem86jep.fsf@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi hackers,

in a recent hacking workshop organized by Robert Haas, we discussed
[1]. Among the autovacuum issues exposed, the useless vacuum case caught
my attention. I've started to study the respective code and I came up
with a prototype to improve the statistics system regarding dead tuples.

The attached patch implements only partially the dead tuples histogram
mentioned in [1]. But, since I'm a beginner, I thought it would be nice
to have an early feedback just to make sure I don't do anything very
wrong.

My initial idea was to implement a growing histogram with a linked list
of bins, exploiting the fact that most of dead tuples are added in the
last bin. Then, I realized that there are no other cases of dynamical
data structures in pg_stats and it would be harder to serialize
it. That's why I choose to implement the histogram in a static data
structure inside one of the pg_stats data structures. It does require a
little bit more logic to maintain the histogram but it is well
integrated in the whole pg_stats architecture.

As discussed in the hacking workshop, one of the problems is to capture
the exact xmin of the dead tuple. In my tests, I've observed that,
outside of a transaction, xmin corresponds to
GetCurrentTransactionId(). But inside a transaction, xmin receives
incremental xids on successive DM statements. Capturing xids for every
statement inside a transaction seems overkill. So, I decided to
attribute the highest xmin/xid of a transaction to all dead tuples
of that transaction.

In order to see the statistics in a table t1, we do:
select pg_stat_get_dead_tuples_xid_freqs ('t1'::regclass),
pg_stat_get_dead_tuples_xid_bounds('t1'::regclass);

Then, to verify that the bounds make sense, I've used:
select xmin from t1;

In this version, the removal of dead tuples is not yet implemented, so
these histograms only grow.

I would really appreciate any kind of feedback.

Best regards,
Renan Fonseca

[1] How Autovacuum Goes Wrong: And Can We Please Make It Stop Doing
That? (PGConf.dev 2024)

Attachment Content-Type Size
0001-Implement-dead-tuples-xid-histograms.patch text/x-patch 16.5 KB

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2025-04-16 18:53:38 Re: Performance issues with v18 SQL-language-function changes
Previous Message Andrei Lepikhov 2025-04-16 18:07:45 Re: A modest proposal: make parser/rewriter/planner inputs read-only