From: | Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com> |
---|---|
To: | Justin Pryzby <pryzby(at)telsasoft(dot)com> |
Cc: | Peter Geoghegan <pg(at)bowt(dot)ie>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Subject: | Re: pg14b1 stuck in lazy_scan_prune/heap_page_prune of pg_statistic |
Date: | 2021-06-08 11:54:41 |
Message-ID: | CAEze2Wi6WrXo_PajFmwfved1AsU1mdXdA=+NsBqZ5E3sXszX1w@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Tue, 8 Jun 2021 at 13:03, Justin Pryzby <pryzby(at)telsasoft(dot)com> wrote:
>
> On Sun, Jun 06, 2021 at 11:00:38AM -0700, Peter Geoghegan wrote:
> > On Sun, Jun 6, 2021 at 9:35 AM Justin Pryzby <pryzby(at)telsasoft(dot)com> wrote:
> > > I'll leave the instance running for a little bit before restarting (or kill-9)
> > > in case someone requests more info.
> >
> > How about dumping the page image out, and sharing it with the list?
> > This procedure should work fine from gdb:
> >
> > https://wiki.postgresql.org/wiki/Getting_a_stack_trace_of_a_running_PostgreSQL_backend_on_Linux/BSD#Dumping_a_page_image_from_within_GDB
>
> > I suggest that you dump the "page" pointer inside lazy_scan_prune(). I
> > imagine that you have the instance already stuck in an infinite loop,
> > so what we'll probably see from the page image is the page after the
> > first prune and another no-progress prune.
>
> The cluster was again rejecting with "too many clients already".
>
> I was able to open a shell this time, but it immediately froze when I tried to
> tab complete "pg_stat_acti"...
>
> I was able to dump the page image, though - attached. I can send you its
> "data" privately, if desirable. I'll also try to step through this.
Could you attach a dump of lazy_scan_prune's vacrel, all the global
visibility states (GlobalVisCatalogRels, and possibly
GlobalVisSharedRels, GlobalVisDataRels, and GlobalVisTempRels), and
heap_page_prune's PruneState?
Additionally, the locals of lazy_scan_prune (more specifically, the
'offnum' when it enters heap_page_prune) would also be appreciated, as
it helps indicate the tuple.
I've been looking at whatever might have done this, and I'm currently
stuck on lacking information in GlobalVisCatalogRels and the
PruneState.
One curiosity that I did notice is that the t_xmax of the problematic
tuples has been exactly one lower than the OldestXmin. Not weird, but
a curiosity.
With regards,
Matthias van de Meent.
PS. Attached a few of my current research notes, which are mainly
comparisons between heap_prune_satisfies_vacuum and
HeapTupleSatisfiesVacuum.
Attachment | Content-Type | Size |
---|---|---|
research_notes.txt | text/plain | 1.1 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Justin Pryzby | 2021-06-08 12:11:36 | Re: pg14b1 stuck in lazy_scan_prune/heap_page_prune of pg_statistic |
Previous Message | Dilip Kumar | 2021-06-08 11:46:26 | Re: Decoding speculative insert with toast leaks memory |