From: | Victor Yegorov <vyegorov(at)gmail(dot)com> |
---|---|
To: | thomas(dot)munro(at)enterprisedb(dot)com |
Cc: | pgsql-bugs(at)lists(dot)postgresql(dot)org |
Subject: | Re: BUG #15290: Stuck Parallel Index Scan query |
Date: | 2018-07-23 11:42:52 |
Message-ID: | CAGnEboi_50VyFM02y7z9gpgVkdFWx5JMieeSDTaJ88qmqFBvoQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs |
пн, 23 июл. 2018 г. в 11:47, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>:
> On Mon, Jul 23, 2018 at 7:57 PM, Victor Yegorov <vyegorov(at)gmail(dot)com>
> wrote:
> > - `ERROR: canceling statement due to conflict with recovery`, happened
> > right when our problematic query started, same user
>
> Ok, so that would explain how the master was cancelled. In 2877's
> stack we see that it was aborting here:
>
Right:
ERROR: canceling statement due to conflict with recovery
DETAIL: User was holding shared buffer pin for too long.
>
> #11 0x00007f539697ba5e in PostgresMain (argc=1,
> argv=argv(at)entry=0x7f5398d1bbc8, dbname=0x7f5398d1bb98 "coub",
> username=0x7f5398d1bbb0 "app") at
>
> /build/postgresql-10-U6N320/postgresql-10-10.4/build/../src/backend/tcop/postgres.c:3879
>
> That line calls AbortCurrentTransaction(), just after the call to
> EmitErrorReport() that wrote something in your log. Andres's theory
> (interrupts 'held') seems promising... perhaps there could be a bug
> where parallel index scans leak a share-locked page or something like
> that. I tried to reproduce this a bit, but no cigar so far. I wonder
> if there could be something about your bloated index that reaches
> buggy behaviour...
>
> If you happen to have a core file for a worker that is waiting in
> ConditionVariableSleep(), or it happens again, you'd be able to see if
> an LWLock is causing this by printing num_held_lwlocks.
>
No, we do not have core files around. And so far I was not able to
reproduce this situation.
I will keep monitoring. In case I'll hit it again — what else (except for
num_held_lwlocks)
should I check for?
--
Victor Yegorov
From | Date | Subject | |
---|---|---|---|
Next Message | Robert Haas | 2018-07-23 15:29:33 | Re: BUG #15182: Canceling authentication due to timeout aka Denial of Service Attack |
Previous Message | Thomas Munro | 2018-07-23 08:46:33 | Re: BUG #15290: Stuck Parallel Index Scan query |