From: | Peter Geoghegan <pg(at)bowt(dot)ie> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: Why does pg_class.reltuples count only live tuples in indexes (after VACUUM runs)? |
Date: | 2022-04-18 19:54:36 |
Message-ID: | CAH2-WzmfddQ18Su7Zp6nnBnb3CMRT8Stv8Zg=ic1OmZcRt7Q=A@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Mon, Apr 18, 2022 at 12:41 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> If the planner looks at index reltuples at all, it's doing so
> for cost estimation purposes, where the count including dead
> entries is probably the right thing to use.
Then why does heapam_index_build_range_scan do it the other way around?
I think that it probably doesn't matter that much in practice. The
inconsistency should be noted in update_relstats_all_indexes, though.
> If you want to make this cleaner, maybe there's a case for
> splitting reltuples into two columns. But then index AMs
> would be on the hook to determine how many of their entries
> are live, which is not really an index's concern.
The main concern behind this is that we're using
vacrel->new_rel_tuples for the IndexVacuumInfo.num_heap_tuples value
in amvacuumcleanup (but not in ambulkdelete), which is calculated
towards the end of lazy_scan_heap, like so:
/*
* Also compute the total number of surviving heap entries. In the
* (unlikely) scenario that new_live_tuples is -1, take it as zero.
*/
vacrel->new_rel_tuples =
Max(vacrel->new_live_tuples, 0) + vacrel->recently_dead_tuples +
vacrel->missed_dead_tuples;
I think that this doesn't really belong here; new_rel_tuples should
only be used for VACUUM VERBOSE/server log output, once we return to
heap_vacuum_rel from lazy_scan_heap. We should use
vacrel->new_live_tuples as our IndexVacuumInfo.num_heap_tuples value
in the amvacuumcleanup path (instead of new_rel_tuples). That way the
rule about IndexVacuumInfo.num_heap_tuples is simple: it's always
taken from pg_class.reltuples (for the heap rel). Either the existing
value, or the new value.
--
Peter Geoghegan
From | Date | Subject | |
---|---|---|---|
Next Message | Simon Riggs | 2022-04-18 19:59:44 | Dump/Restore of non-default PKs |
Previous Message | Mark Dilger | 2022-04-18 19:46:09 | Re: Postgres perl module namespace |