Re: pg14b1 stuck in lazy_scan_prune/heap_page_prune of pg_statistic

From: Justin Pryzby <pryzby(at)telsasoft(dot)com>
To: Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, Andres Freund <andres(at)anarazel(dot)de>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: pg14b1 stuck in lazy_scan_prune/heap_page_prune of pg_statistic
Date: 2021-06-06 20:06:23
Message-ID: 20210606200623.GW14099@telsasoft.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Jun 06, 2021 at 07:26:22PM +0200, Matthias van de Meent wrote:
> I think it would be helpful for further debugging if we would have the
> state of the all tuples on that page (well, the tuple headers with
> their transactionids and their line pointers), as that would help with
> determining if my suspicion could be correct.

This is the state of the page after I killed the cluster and started a
postmaster on a copy of its datadir, with autovacuum=off:

SELECT lp, lp_off, lp_flags, lp_len, t_xmin, t_xmax, t_field3, t_ctid, t_infomask2, t_infomask, t_hoff, t_bits, t_oid FROM heap_page_items(get_raw_page('pg_statistic', 75)) ;
lp | lp_off | lp_flags | lp_len | t_xmin | t_xmax | t_field3 | t_ctid | t_infomask2 | t_infomask | t_hoff | t_bits | t_oid
----+--------+----------+--------+-----------+-----------+----------+---------+-------------+------------+--------+----------------------------------+-------
1 | 0 | 3 | 0 | | | | | | | | |
2 | 0 | 3 | 0 | | | | | | | | |
3 | 0 | 3 | 0 | | | | | | | | |
4 | 7624 | 1 | 564 | 913726913 | 913730328 | 0 | (83,19) | 31 | 8451 | 32 | 11111111111111111111101000100000 |
5 | 6432 | 1 | 1188 | 913726913 | 913730328 | 0 | (83,20) | 31 | 8451 | 32 | 11111111111111111111110100110000 |
6 | 6232 | 1 | 195 | 913726913 | 913730328 | 0 | (83,21) | 31 | 8451 | 32 | 11111111111111111111111000100000 |
7 | 6032 | 1 | 195 | 913726913 | 913730328 | 0 | (83,22) | 31 | 8451 | 32 | 11111111111111111111111000100000 |
8 | 5848 | 1 | 181 | 913726913 | 913730328 | 0 | (83,23) | 31 | 8451 | 32 | 11111111111111111111111000100000 |
9 | 5664 | 1 | 181 | 913726913 | 913730328 | 0 | (81,13) | 31 | 8451 | 32 | 11111111111111111111111000100000 |
10 | 5488 | 1 | 176 | 913726913 | 913730328 | 0 | (81,14) | 31 | 8451 | 32 | 11111111111111111111111000100000 |
11 | 5312 | 1 | 176 | 913726913 | 913730328 | 0 | (82,13) | 31 | 8451 | 32 | 11111111111111111111111000100000 |
12 | 5128 | 1 | 181 | 913726913 | 913730328 | 0 | (79,57) | 31 | 8451 | 32 | 11111111111111111111111000100000 |
13 | 4952 | 1 | 176 | 913726913 | 913730328 | 0 | (80,25) | 31 | 8451 | 32 | 11111111111111111111111000100000 |
14 | 4776 | 1 | 176 | 913726913 | 913730328 | 0 | (80,26) | 31 | 8451 | 32 | 11111111111111111111111000100000 |
15 | 4600 | 1 | 176 | 913726913 | 913730328 | 0 | (84,1) | 31 | 8451 | 32 | 11111111111111111111111000100000 |
16 | 4424 | 1 | 176 | 913726913 | 913730328 | 0 | (84,2) | 31 | 8451 | 32 | 11111111111111111111111000100000 |
17 | 4248 | 1 | 176 | 913726913 | 913730328 | 0 | (84,3) | 31 | 8451 | 32 | 11111111111111111111111000100000 |
18 | 2880 | 1 | 1364 | 913726913 | 913730328 | 0 | (84,4) | 31 | 8451 | 32 | 11111111111111111111110100110000 |
19 | 2696 | 1 | 179 | 913726914 | 0 | 0 | (75,19) | 31 | 10499 | 32 | 11111111111111111111111000100000 |
20 | 2520 | 1 | 176 | 913726914 | 0 | 0 | (75,20) | 31 | 10499 | 32 | 11111111111111111111111000100000 |
21 | 2336 | 1 | 179 | 913726914 | 0 | 0 | (75,21) | 31 | 10499 | 32 | 11111111111111111111111000100000 |
(21 rows)

(In the interest of full disclosure, this was a dumb cp -a of the live datadir
where the processes had been stuck for a day, and where I was unable to open a
client session).

8451 = HEAP_KEYS_UPDATED + 259 atts?

Note that after I vacuum pg_statistic, it looks like this:

ts=# SELECT lp, lp_off, lp_flags, lp_len, t_xmin, t_xmax, t_field3, t_ctid, t_infomask2, t_infomask, t_hoff, t_bits, t_oid FROM heap_page_items(get_raw_page('pg_statistic', 75));
lp | lp_off | lp_flags | lp_len | t_xmin | t_xmax | t_field3 | t_ctid | t_infomask2 | t_infomask | t_hoff | t_bits | t_oid
----+--------+----------+--------+-----------+--------+----------+---------+-------------+------------+--------+----------------------------------+-------
1 | 0 | 0 | 0 | | | | | | | | |
...
18 | 0 | 0 | 0 | | | | | | | | |
19 | 8008 | 1 | 179 | 913726914 | 0 | 0 | (75,19) | 31 | 10499 | 32 | 11111111111111111111111000100000 |
20 | 7832 | 1 | 176 | 913726914 | 0 | 0 | (75,20) | 31 | 10499 | 32 | 11111111111111111111111000100000 |
21 | 7648 | 1 | 179 | 913726914 | 0 | 0 | (75,21) | 31 | 10499 | 32 | 11111111111111111111111000100000 |

ts=# VACUUM VERBOSE pg_statistic;
|INFO: vacuuming "pg_catalog.pg_statistic"
|INFO: scanned index "pg_statistic_relid_att_inh_index" to remove 278403 row versions
|DETAIL: CPU: user: 0.10 s, system: 0.00 s, elapsed: 0.14 s
|INFO: "pg_statistic": removed 278403 dead item identifiers in 4747 pages
|DETAIL: CPU: user: 0.12 s, system: 0.07 s, elapsed: 1.99 s
|INFO: index "pg_statistic_relid_att_inh_index" now contains 1101 row versions in 758 pages
|DETAIL: 277271 index row versions were removed.
|747 index pages were newly deleted.
|747 index pages are currently deleted, of which 0 are currently reusable.
|CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
|INFO: "pg_statistic": found 277216 removable, 1101 nonremovable row versions in 4758 out of 4758 pages
|DETAIL: 0 dead row versions cannot be removed yet, oldest xmin: 920282115
|0 pages removed.
|Skipped 0 pages due to buffer pins, 0 frozen pages.
|CPU: user: 1.75 s, system: 0.18 s, elapsed: 6.52 s.
|INFO: "pg_statistic": truncated 4758 to 96 pages
|DETAIL: CPU: user: 0.02 s, system: 0.02 s, elapsed: 0.06 s
|INFO: vacuuming "pg_toast.pg_toast_2619"
|INFO: scanned index "pg_toast_2619_index" to remove 30 row versions
|DETAIL: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s
|INFO: "pg_toast_2619": removed 30 dead item identifiers in 11 pages
|DETAIL: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s
|INFO: index "pg_toast_2619_index" now contains 115 row versions in 2 pages
|DETAIL: 3 index row versions were removed.
|0 index pages were newly deleted.
|0 index pages are currently deleted, of which 0 are currently reusable.
|CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
|INFO: "pg_toast_2619": found 29 removable, 115 nonremovable row versions in 43 out of 43 pages
|DETAIL: 0 dead row versions cannot be removed yet, oldest xmin: 920282115
|0 pages removed.
|Skipped 0 pages due to buffer pins, 0 frozen pages.
|CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.03 s.
|VACUUM

Before:
pg_catalog | pg_statistic | table | postgres | permanent | heap | 38 MB |
After:
pg_catalog | pg_statistic | table | postgres | permanent | heap | 1192 kB |

I also have nearly-intact bt f from the partial core:

#0 0x00000000004fa063 in heap_prune_chain (prstate=0x7ffe7a0cd0c0, rootoffnum=4, buffer=14138) at pruneheap.c:592
lp = <optimized out>
tupdead = <optimized out>
recent_dead = <optimized out>
rootlp = 0x2aaab2089e24
nchain = 0
tup = {t_len = 564, t_self = {ip_blkid = {bi_hi = 0, bi_lo = 75}, ip_posid = 4}, t_tableOid = 2619, t_data = 0x2aaab208bbc8}
ndeleted = 0
priorXmax = 0
htup = <optimized out>
maxoff = 21
offnum = 4
...
#1 heap_page_prune (relation=relation(at)entry=0x7f0349466d28, buffer=buffer(at)entry=14138, vistest=vistest(at)entry=0xe7bcc0 <GlobalVisCatalogRels>, old_snap_xmin=old_snap_xmin(at)entry=0, old_snap_ts=old_snap_ts(at)entry=0,
report_stats=report_stats(at)entry=false, off_loc=<optimized out>, off_loc(at)entry=0x1d1b3fc) at pruneheap.c:278
itemid = 0x2aaab2089e24
ndeleted = 0
page = 0x2aaab2089e00 <Address 0x2aaab2089e00 out of bounds>
offnum = 4
maxoff = 21
...
#2 0x00000000004fd9bf in lazy_scan_prune (vacrel=vacrel(at)entry=0x1d1b390, buf=buf(at)entry=14138, blkno=blkno(at)entry=75, page=page(at)entry=0x2aaab2089e00 <Address 0x2aaab2089e00 out of bounds>,
vistest=vistest(at)entry=0xe7bcc0 <GlobalVisCatalogRels>, prunestate=prunestate(at)entry=0x7ffe7a0ced80) at vacuumlazy.c:1712
rel = 0x7f0349466d28
offnum = 4
maxoff = 21
itemid = 0x2aaab2089e24
tuple = {t_len = 564, t_self = {ip_blkid = {bi_hi = 0, bi_lo = 75}, ip_posid = 4}, t_tableOid = 2619, t_data = 0x2aaab208bbc8}
res = <optimized out>
tuples_deleted = 0
lpdead_items = 0
new_dead_tuples = 0
num_tuples = 0
live_tuples = 0
nfrozen = 0

--
Justin

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2021-06-06 20:08:52 Re: PoC/WIP: Extended statistics on expressions
Previous Message Tomas Vondra 2021-06-06 20:01:23 Re: PoC/WIP: Extended statistics on expressions