Re: Disk wait problem...

From: pf(at)pfortin(dot)com
To: pgsql-general(at)lists(dot)postgresql(dot)org
Subject: Re: Disk wait problem...
Date: 2023-10-26 19:50:16
Message-ID: 20231026155016.6b739a7e.pfortin@pfortin.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hi Peter,

All of the following is based on using SQL_workbench/J (WB) (Mageia Linux
has not provided a viable pgAdmin4 package); WB is setup to autoload the
table row count and 10 rows. I'm sticking to one set of files where they
are all roughly 33.6M rows.

I've been doing a lot of digging and found more interesting info...

At first, I thought the disk was in disk wait the entire time of the
delay; this _appeared_ to be the case since the system journal logged
everything about each query, with every query looking the same; no errors.

I was going to provide some logs; but suffice it to say that the logs,
combined with running iostat every second has been enlightening...

Every time I switch from one table to another, sometimes the results
display in:
* less than a half-second
* one or more seconds
* up to 5+ minutes.
So far, none has failed to eventually return; just l..o..n..g delays...

This morning, I've been using iostat:
$ while true; do echo -n "$(date +%H:%M:%S) "; iostat /dev/nvme1n1 |
grep nvme1n1; sleep 1; done

When the response is fast; I see iostat report ~225000 bytes read. On the
other hand when the query takes a long time to respond, I see the read
bytes spread over time... a little every second... So while htop
appears to report a steady D (disk wait) status, I now suspect the short
transfers are fast and most of the time, htop is going to catch the D
state.

It definitely appears to be a problem with the 4TB NVMe SSD drive which
utilities say it still has 95% life left[1]. The drive is reporting:
Error Information Log Entries: 1,177
which is growing, so I'll be digging into that. It also reports:

Error Information (NVMe Log 0x01, 16 of 63 entries)
Num ErrCount SQId CmdId Status PELoc LBA NSID VS Message
0 1177 0 0xc002 0x4004 - 0 1 - Invalid Field in Command
1 1176 0 0xb001 0x4004 - 0 1 - Invalid Field in Command
2 1175 0 0x801a 0x4004 - 0 1 - Invalid Field in Command
...
This list is not growing, so I'll look into it later than sooner.

A few moments ago, I decided to switch to the backup which resides on a
17TB mechanical platter drive which is running great.

[1] SSD drives wear out with usage; think of it as a vehicle tire which
slowly wears out.

Also found that there is a firmware update available from:
https://www.seagate.com/support/downloads/

No idea if this will solve the problem; but...

Thanks for your suggestions.

Thought I saw there may be instrumentation within PG which I can access;
suggestions welcome if so...

Cheers,
Pierre

On Thu, 26 Oct 2023 11:50:00 +0200 Peter J. Holzer wrote:

>On 2023-10-23 16:31:30 -0700, Adrian Klaver wrote:
>> On 10/23/23 14:55, pf(at)pfortin(dot)com wrote:
>> > Next, I made a query list for 30 tables, turned on timing and pasted the
>> > queries. Pasting results inline:
>> > ncsbe=# select count(*) from ncvhis_2012_12_06;
>> > select count(*) from ncvhis_2016_12_03;
>> > select count(*) from ncvhis_2020_08_01;
>> > select count(*) from ncvhis_2020_10_31;
>[...]
>> > Time: 72404.786 ms (01:12.405)
>> > Time: 301.246 ms
>> > Time: 409.974 ms
>> > Time: 363.836 ms
>> > Time: 351.167 ms
>> > Time: 348.378 ms
>> > Time: 348.712 ms
>> > Time: 351.901 ms
>> > Time: 487.837 ms
>> > Time: 40987.826 ms (00:40.988)
>> > Time: 76964.281 ms (01:16.964)
>> > Time: 483.329 ms
>> > Time: 18919.267 ms (00:18.919)
>> > Time: 50896.978 ms (00:50.897)
>> > Time: 25784.000 ms (00:25.784)
>> > Time: 45407.599 ms (00:45.408)
>> > Time: 75171.609 ms (01:15.172)
>> > Time: 88871.004 ms (01:28.871)
>> > Time: 128565.127 ms (02:08.565)
>> > Time: 210248.222 ms (03:30.248)
>> > Time: 246609.561 ms (04:06.610) 4 minutes! ;p
>> > Time: 166406.283 ms (02:46.406)
>> > Time: 291483.538 ms (04:51.484) nearly 5 minutes ;p ;p
>> > Time: 2269.961 ms (00:02.270)
>> > Time: 413.219 ms
>> > Time: 433.817 ms
>> > Time: 395.844 ms
>> > Time: 7251.637 ms (00:07.252)
>> > Time: 384.328 ms
>> > Time: 384.887 ms
>> >
>> > I don't see a pattern in the above; matches the randomness I saw using
>> > WB...
>
>It would be interesting to see whether there's a difference in plan
>between the slow and the fast queries.
>
>If your experiment above is repeatable, you can just prefix every
>query with
> explain (analyze, buffers)
>
>
>explain (analyze, buffers) select count(*) from ncvhis_2016_12_03;
>explain (analyze, buffers) select count(*) from ncvhis_2020_08_01;
>explain (analyze, buffers) select count(*) from ncvhis_2020_10_31;
>...
>
>If it is hard to reproduce (happens only sometimes), the best way to
>catch is probably to enable auto_explain
><https://www.postgresql.org/docs/current/auto-explain.html>, possibly
>with the auto_explain.log_analyze option and check the logs after it
>happened. (auto_explain is generally a good idea, IMHO).
>
>
>> > Before the above finished, I issued this command on another konsole...
>> >
>> > $ while true; do ls -l > /tmp/ll; date; done
>
>This is unlikely to generate noticeable disk waits. The current
>directory will be in the cache after the first ls and the writes happen
>asynchroneously.
>
> hp
>

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Bruce Momjian 2023-10-26 22:04:41 Re: Question regarding the new SQL standard
Previous Message Christophe Pettus 2023-10-26 18:56:21 Re: purpose of an entry in pg_hba.conf file