RE: postgres backend process hang on " D " state

From: "James Pang (chaolpan)" <chaolpan(at)cisco(dot)com>
To: Justin Pryzby <pryzby(at)telsasoft(dot)com>
Cc: "pgsql-performance(at)lists(dot)postgresql(dot)org" <pgsql-performance(at)lists(dot)postgresql(dot)org>
Subject: RE: postgres backend process hang on " D " state
Date: 2022-05-30 02:58:03
Message-ID: PH0PR11MB5191AE5287660F9692884ED7D6DD9@PH0PR11MB5191.namprd11.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Update your questions

. time find /pgdata /pgarchive /pgwal -ls |wc
82165 903817 8397391

real 0m1.120s
user 0m0.432s
sys 0m0.800s

ps -u postgres -O wchan=============================
PID ============================ S TTY TIME COMMAND
1951 - D ? 00:26:37 /usr/pgsql-13/bin/postmaster -D /pgdata -c config_file=/pgdata/postgresql.conf
2341 - S ? 00:00:06 postgres: logger
2361 - S ? 00:01:02 postgres: checkpointer
2362 - S ? 00:00:27 postgres: background writer
2363 - S ? 00:00:59 postgres: walwriter
2364 - S ? 00:02:00 postgres: autovacuum launcher
2365 - Z ? 00:00:04 [postmaster] <defunct>
2366 do_epoll_wait S ? 00:13:30 postgres: stats collector
2367 do_epoll_wait S ? 00:00:18 postgres: pg_cron launcher
2368 - S ? 00:00:00 postgres: logical replication launcher
1053144 - Z ? 00:05:36 [postmaster] <defunct>
1053319 - Z ? 00:05:29 [postmaster] <defunct>
1053354 - Z ? 00:05:53 [postmaster] <defunct>
1053394 - Z ? 00:05:51 [postmaster] <defunct>
...
1064387 - Z ? 00:05:13 [postmaster] <defunct>
1070257 - D ? 00:24:23 postgres: test pbwd 192.168.205.53(55886) BIND
1070258 - D ? 00:24:24 postgres: test pbwd 192.168.205.50(58910) BIND
1070259 - D ? 00:24:22 postgres: test pbwd 192.168.205.133(48754) SELECT
1070260 - Z ? 00:05:02 [postmaster] <defunct>
...

Strace / gdb will hang there too for trace a process.

Regards,

James

-----Original Message-----
From: Justin Pryzby <pryzby(at)telsasoft(dot)com>
Sent: Monday, May 30, 2022 10:20 AM
To: James Pang (chaolpan) <chaolpan(at)cisco(dot)com>
Cc: pgsql-performance(at)lists(dot)postgresql(dot)org
Subject: Re: postgres backend process hang on " D " state

On Mon, May 30, 2022 at 01:19:56AM +0000, James Pang (chaolpan) wrote:
> 1. extensions
> shared_preload_libraries = 'orafce,pgaudit,pg_cron,pg_stat_statements,set_user'
> 2. psql can not login now ,it hang there too, so can not check
> anything from pg_stats_* views 3. one main app user and 2 schemas ,no long running transactions .
> 4. we use /pgdata , it's on xfs , lvm/vg RHEL8.4 ,it's a shared storage, no use root filesystem.
> /dev/mapper/pgdatavg-pgdatalv 500G 230G 271G 46% /pgdata
> /dev/mapper/pgdatavg-pgarchivelv 190G 1.5G 189G 1% /pgarchive
> /dev/mapper/pgdatavg-pgwallv 100G 34G 67G 34% /pgwal

What are the LVM PVs ? Is it a scsi/virt device ? Or iscsi/drbd/???

I didn't hear back if there's any kernel errors.
Is the storage broken/stuck/disconnected ?
Can you run "time find /pgdata /pgarchive /pgwal -ls |wc" ?

Could you run "ps -u postgres -O wchan============================="

Can you strace one of the stuck backends ?

It sounds like you'll have to restart the service or VM (forcibly if necessary) to resolve the immediate issue and then collect the other info, and leave a "psql" open to try to (if the problem recurs) check pg_stat_activity and other DB info.

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message James Pang (chaolpan) 2022-05-30 02:59:31 RE: postgres backend process hang on " D " state
Previous Message Tom Lane 2022-05-30 02:21:07 Re: postgres backend process hang on " D " state