From: | Andres Freund <andres(at)anarazel(dot)de> |
---|---|
To: | Melanie Plageman <melanieplageman(at)gmail(dot)com> |
Cc: | Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Tomas Vondra <tomas(at)vondra(dot)me>, Nazir Bilal Yavuz <byavuz81(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: BitmapHeapScan streaming read user and prelim refactoring |
Date: | 2025-03-12 00:07:38 |
Message-ID: | ihb3ggccagv34lpm4ninuq5gkvhi5fuikdzr6677spevp7fxlu@bj6ryi4jkjf4 |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
On 2025-03-10 19:45:38 -0400, Melanie Plageman wrote:
> From 7b35b1144bddf202fb4d56a9b783751a0945ba0e Mon Sep 17 00:00:00 2001
> From: Melanie Plageman <melanieplageman(at)gmail(dot)com>
> Date: Mon, 10 Mar 2025 17:17:38 -0400
> Subject: [PATCH v35 1/5] Increase default effective_io_concurrency to 16
>
> The default effective_io_concurrency has been 1 since it was introduced
> in b7b8f0b6096d2ab6e. Referencing the associated discussion [1], it
> seems 1 was chosen as a conservative value that seemed unlikely to cause
> regressions.
16 years...
> Experimentation on high latency cloud storage as well as fast, local
> nvme storage (see Discussion link) shows that even slightly higher values
> improve query timings substantially. 1 actually performs worse than 0.
> With effective_io_concurrency 1, we are not prefetching enough to avoid
> I/O stalls, but we are issuing extra syscalls.
Makes sense.
> Moreover, when bitmap heap scan is converted to using the read stream
> API, a prefetch distance of 1 will prevent read combining which is quite
> detrimental to performance.
Hm? This one surprises me. Doesn't the read stream code take some pains to
still perform IO combining when effective_io_concurrency=1? It does work for
seqscans, for example?
> The new default is 16, which should be more appropriate in the general
> case while still avoiding flooding low IOPs devices with I/O requests.
Maybe s/in the general case/for common hardware/?
> [1] https://www.postgresql.org/message-id/flat/FDDBA24E-FF4D-4654-BA75-692B3BA71B97%40enterprisedb.com
>
> Discussion: https://postgr.es/m/CAAKRu_Z%2BJa-mwXebOoOERMMUMvJeRhzTjad4dSThxG0JLXESxw%40mail.gmail.com
> ---
> doc/src/sgml/config.sgml | 38 +++++++++----------
> src/backend/utils/misc/postgresql.conf.sample | 2 +-
> src/include/storage/bufmgr.h | 2 +-
> 3 files changed, 19 insertions(+), 23 deletions(-)
>
> diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
> index d2fa5f7d1a9..8c4409fc8bf 100644
> --- a/doc/src/sgml/config.sgml
> +++ b/doc/src/sgml/config.sgml
> @@ -2577,36 +2577,32 @@ include_dir 'conf.d'
> Sets the number of concurrent disk I/O operations that
> <productname>PostgreSQL</productname> expects can be executed
> simultaneously. Raising this value will increase the number of I/O
> - operations that any individual <productname>PostgreSQL</productname> session
> - attempts to initiate in parallel. The allowed range is 1 to 1000,
> - or zero to disable issuance of asynchronous I/O requests. Currently,
> - this setting only affects bitmap heap scans.
> + operations that any individual <productname>PostgreSQL</productname>
> + session attempts to initiate in parallel. The allowed range is
> + <literal>1</literal> to <literal>1000</literal>, or
> + <literal>0</literal> to disable issuance of asynchronous I/O requests.
> + The default is <literal>16</literal> on supported systems, otherwise
> + <literal>0</literal>. Currently, this setting only affects bitmap heap
> + scans.
> </para>
I'd probably use this as an occasion to remove "Currently, this setting only
affects bitmap heap" sentence - afaict it's been wrong for a while and got
more wrong since vacuum started to use read streams...
> <para>
> - For magnetic drives, a good starting point for this setting is the
> - number of separate
> - drives comprising a RAID 0 stripe or RAID 1 mirror being used for the
> - database. (For RAID 5 the parity drive should not be counted.)
> - However, if the database is often busy with multiple queries issued in
> - concurrent sessions, lower values may be sufficient to keep the disk
> - array busy. A value higher than needed to keep the disks busy will
> - only result in extra CPU overhead.
> - SSDs and other memory-based storage can often process many
> - concurrent requests, so the best value might be in the hundreds.
Afaict this whole paragraph was *never* correct... Obviously that's not
criticism of your removing it ;)
> + Higher values will have the most impact on higher latency storage
> + where queries otherwise experience noticeable I/O stalls and on
> + devices with high IOPs. Higher values than needed to satisfy the query
> + or keep the device busy can be expected to only introduce extra CPU
> + overhead.
> </para>
I'd say unnecessarily high values also can increase IO latency.
Greetings,
Andres Freund
From | Date | Subject | |
---|---|---|---|
Next Message | Maciek Sakrejda | 2025-03-12 00:41:28 | Re: Question about duplicate JSONTYPE_JSON check |
Previous Message | Andres Freund | 2025-03-11 23:55:35 | Re: AIO v2.5 |