Re: Outage caused by query in IPC:MessageQueueReceive wait????

From: Chris Hoover <chrish(at)aweber(dot)com>
To: Paul Barrett <tinojam(at)gmail(dot)com>
Cc: pgsql-admin(at)lists(dot)postgresql(dot)org
Subject: Re: Outage caused by query in IPC:MessageQueueReceive wait????
Date: 2024-06-14 19:47:22
Message-ID: 290BB1BE-4F67-4CE5-84B3-67DA8B43DAFA@aweber.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

Unfortunately I don’t. I do snapshot pg_locks and pg_stat_statements every minute (along with several other catalogs). I was able to back trace to this single query in a transaction causing issue:

```
snapped_at | 2024-06-14 10:00:00.024303-04
pid | 13900
...
backend_start | 2024-06-14 09:59:09.604535-04
xact_start | 2024-06-14 09:59:50.026618-04
query_start | 2024-06-14 09:59:50.881825-04
state_change | 2024-06-14 09:59:50.881826-04
wait_event_type | IPC
wait_event | MessageQueueReceive
state | active
backend_xid | 2334413815
backend_xmin | 2334413815
...
```

Here are the minute by minute log showing the query stuck for 16 minutes:
```
select snapped_at, pid, wait_event_type, wait_event from pg_stat_activity_snapshot_table where snapped_at >= '2024-06-14 09:50' and snapped_at < '2024-06-14 10:20' and pid = 13900 order by snapped_at
;
snapped_at | pid | wait_event_type | wait_event
-------------------------------+-------+-----------------+---------------------
2024-06-14 10:00:00.024303-04 | 13900 | IPC | MessageQueueReceive
2024-06-14 10:01:00.013122-04 | 13900 | IPC | MessageQueueReceive
2024-06-14 10:02:00.011187-04 | 13900 | IPC | MessageQueueReceive
2024-06-14 10:03:00.010872-04 | 13900 | IPC | MessageQueueReceive
2024-06-14 10:04:00.013126-04 | 13900 | IPC | MessageQueueReceive
2024-06-14 10:05:00.019508-04 | 13900 | IPC | MessageQueueReceive
2024-06-14 10:06:00.010661-04 | 13900 | IPC | MessageQueueReceive
2024-06-14 10:07:00.011247-04 | 13900 | IPC | MessageQueueReceive
2024-06-14 10:08:00.010288-04 | 13900 | IPC | MessageQueueReceive
2024-06-14 10:09:00.011029-04 | 13900 | IPC | MessageQueueReceive
2024-06-14 10:10:00.021545-04 | 13900 | IPC | MessageQueueReceive
2024-06-14 10:11:00.011503-04 | 13900 | IPC | MessageQueueReceive
2024-06-14 10:12:00.010562-04 | 13900 | IPC | MessageQueueReceive
2024-06-14 10:13:00.011171-04 | 13900 | IPC | MessageQueueReceive
2024-06-14 10:14:00.01087-04 | 13900 | IPC | MessageQueueReceive
2024-06-14 10:15:00.01752-04 | 13900 | IPC | MessageQueueReceive
2024-06-14 10:16:00.010518-04 | 13900 | IPC | MessageQueueReceive
(17 rows)
```

Thanks,

Chris Hoover
Senior DBA
AWeber.com
Cell: (803) 528-2269
Email: chrish(at)aweber(dot)com

> On Jun 14, 2024, at 2:27 PM, Paul Barrett <tinojam(at)gmail(dot)com> wrote:
>
> Hi Chris,
> Do you have log_lock_waits enabled? If yes, check the Postgres log to see what process was causing the block/lock. I hope this helps.
>
> Thanks,
> Paul B
>
> On Fri, Jun 14, 2024 at 2:19 PM Chris Hoover <chrish(at)aweber(dot)com <mailto:chrish(at)aweber(dot)com>> wrote:
>> Yes, it appeared to be in a transaction. The pg_stat_activity.xact_start was before the query_start. Unfortunately, I don’t have the other queries. (I snapshot pg_stat_activity every minute, so it missed any prior queries to the blocker).
>>
>> Thanks,
>>
>>
>> Chris Hoover
>> Senior DBA
>> AWeber.com
>> Cell: (803) 528-2269
>> Email: chrish(at)aweber(dot)com <mailto:chrish(at)aweber(dot)com>
>>
>>
>>
>>> On Jun 14, 2024, at 2:16 PM, Ozgur Kulu <kuluozgur(at)gmail(dot)com <mailto:kuluozgur(at)gmail(dot)com>> wrote:
>>>
>>> Hi chris,
>>>
>>> Is this query in transaction block ?
>>>
>>> 14 Haz 2024 Cum 20:03 tarihinde Chris Hoover <chrish(at)aweber(dot)com <mailto:chrish(at)aweber(dot)com>> şunu yazdı:
>>>> Forgot to mention, this is on PG 16.1
>>>>
>>>> Thanks,
>>>>
>>>>
>>>> Chris Hoover
>>>> Senior DBA
>>>> AWeber.com
>>>> Cell: (803) 528-2269
>>>> Email: chrish(at)aweber(dot)com <mailto:chrish(at)aweber(dot)com>
>>>>
>>>>
>>>>
>>>>> On Jun 14, 2024, at 12:52 PM, Chris Hoover <chrish(at)aweber(dot)com <mailto:chrish(at)aweber(dot)com>> wrote:
>>>>>
>>>>> All,
>>>>>
>>>>> We had an outage today that appears to have been caused by a query in "IPC:MessageQueueReceive” wait state blocking another query from completing a truncate.
>>>>>
>>>>> Is there anyway to figure out what exactly it was waiting on other than an IPC message? I’m trying to get to the true root cause of our outage.
>>>>>
>>>>> Thanks,
>>>>>
>>>>>
>>>>> Chris Hoover
>>>>> Senior DBA
>>>>> AWeber.com
>>>>> Cell: (803) 528-2269
>>>>> Email: chrish(at)aweber(dot)com <mailto:chrish(at)aweber(dot)com>
>>>>>
>>>>>
>>>>>
>>>>
>>

In response to

Browse pgsql-admin by date

  From Date Subject
Next Message Rui DeSousa 2024-06-14 19:54:38 Re: Keepalive
Previous Message Chris Hoover 2024-06-14 18:19:06 Re: Outage caused by query in IPC:MessageQueueReceive wait????