From: | Gregory Smith <gregsmithpgsql(at)gmail(dot)com> |
---|---|
To: | Andres Freund <andres(at)anarazel(dot)de> |
Cc: | Bruce Momjian <bruce(at)momjian(dot)us>, Melanie Plageman <melanieplageman(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Greg Smith <greg(dot)smith(at)crunchydata(dot)com> |
Subject: | Re: Increase default maintenance_io_concurrency to 16 |
Date: | 2025-03-18 23:52:04 |
Message-ID: | CAHLJuCVJkfNozHikJYGUe+xHnPGL-+YB7RWsuC0c_XR0ytrnQg@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-committers pgsql-hackers |
On Tue, Mar 18, 2025 at 5:04 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
> Is that actually a good description of what we assume? I don't know where
> that
> 90% is coming from?
That one's all my fault. It was an attempt to curve-fit backwards why the
4.0 number Tom set with his initial commit worked as well as it did given
that underlying storage was closer to 50X as slow, and I sold the idea well
enough for Bruce to follow the reasoning and commit it. Back then there
was a regular procession of people who measured the actual rate and
wondered why there was the order of magnitude difference between those
measurements and the parameter. Pointing them toward thinking in terms of
the cached read percentage too did a reasonable job of deflecting them onto
why the model was more complicated than it seems. I intended to follow
that up with more measurements, only to lose the whole project into a
non-disclosure void I have only recently escaped
I agree with your observation that the underlying cost of a non-sequential
read stall on cloud storage is not markedly better than the original
random: sequential ratio of mechanical drives. And the PG17 refactoring
to improve I/O chunking worked to magnify that further.
The end of this problem I'm working on again is assembling some useful mix
of workloads such that I can try changing one of these magic constants with
higher confidence. My main working set so far is write performance
regression test sets against the Open Street Map loading workload, that
I've been blogging about, plus the old read-only queries of the SELECT-only
spaced along a scale/client grid. My experiments so far have been around
another Tom special, the maximum buffer usage count limit, which turned
into another black hole full of work I have only recently escaped. I
haven't really thought much yet about a workload set that would allow
adjusting random_page_cost. On the query side we've been pretty heads down
on the TPC-H and Clickbench sets. I don't have buffer internals data from
those yet though, will have to add that to the work queue.
--
Greg Smith
Director of Open Source Strategy, Crunchy Data
greg(dot)smith(at)crunchydata(dot)com
From | Date | Subject | |
---|---|---|---|
Next Message | Michael Paquier | 2025-03-18 23:52:57 | pgsql: Fix copy-paste error related to the autovacuum launcher in pgsta |
Previous Message | Masahiko Sawada | 2025-03-18 23:38:59 | pgsql: Fix assertion failure in parallel vacuum with minimal maintenanc |
From | Date | Subject | |
---|---|---|---|
Next Message | Michael Paquier | 2025-03-18 23:56:54 | Re: Add Pipelining support in psql |
Previous Message | Masahiko Sawada | 2025-03-18 23:45:55 | Re: maintenance_work_mem = 64kB doesn't work for vacuum |