Re: pg14b1 stuck in lazy_scan_prune/heap_page_prune of pg_statistic

From: Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>
To: Justin Pryzby <pryzby(at)telsasoft(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, Andres Freund <andres(at)anarazel(dot)de>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: pg14b1 stuck in lazy_scan_prune/heap_page_prune of pg_statistic
Date: 2021-06-06 17:26:22
Message-ID: CAEze2Whrnkcr_era2p7X-tZyFUE_3t9pKfLYWSYW70Ssis3sqw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, 6 Jun 2021 at 18:35, Justin Pryzby <pryzby(at)telsasoft(dot)com> wrote:
>
> An internal instance was rejecting connections with "too many clients".
> I found a bunch of processes waiting on a futex and I was going to upgrade the
> kernel (3.10.0-514) and dismiss the issue.
>
> However, I also found an autovacuum chewing 100% CPU, and it appears the
> problem is actually because autovacuum has locked a page of pg-statistic, and
> every other process then gets stuck waiting in the planner. I checked a few
> and found these:

My suspicion is that for some tuple on that page
HeapTupleSatisfiesVacuum() returns HEAPTUPLE_DEAD for a tuple that it
thinks should have been cleaned up by heap_page_prune, but isn't. This
would result in an infinite loop in lazy_scan_prune where the
condition on vacuumlazy.c:1800 will always be true, but the retry will
not do the job it's expected to do.

Apart from reporting this suspicion, I sadly can't help you much
further, as my knowledge and experience on vacuum and snapshot
horizons is only limited and probably won't help you in this.

I think it would be helpful for further debugging if we would have the
state of the all tuples on that page (well, the tuple headers with
their transactionids and their line pointers), as that would help with
determining if my suspicion could be correct.

With regards,

Matthias van de Meent

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2021-06-06 17:59:10 Re: pg14b1 stuck in lazy_scan_prune/heap_page_prune of pg_statistic
Previous Message Omar Kilani 2021-06-06 16:38:29 Re: Strangeness with UNIQUE indexes and UTF-8