Quick Links

Re: Confine vacuum skip logic to lazy_scan_skip

From:	Tomas Vondra <tomas(at)vondra(dot)me>
To:	Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Cc:	Melanie Plageman <melanieplageman(at)gmail(dot)com>, Noah Misch <noah(at)leadboat(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Nazir Bilal Yavuz <byavuz81(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, "Andrey M(dot) Borodin" <x4mmm(at)yandex-team(dot)ru>
Subject:	Re: Confine vacuum skip logic to lazy_scan_skip
Date:	2025-01-18 22:04:15
Message-ID:	c0ce1867-16a7-4998-93d6-31e715569c3e@vondra.me
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On 1/18/25 22:31, Thomas Munro wrote:
> On Sun, Jan 19, 2025 at 5:51 AM Tomas Vondra <tomas(at)vondra(dot)me> wrote:
>> * Does it still make sense to default to eic=1? For this particular test
>> increasing eic=4 often cuts the duration in half (especially on nvme
>> storage).
>
> Maybe it wasn't a bad choice for systems with one spinning disk, but
> obviously typical hardware has changed completely since then. Bruce
> even changed the docs to recommend "hundreds" on SSDs (46eafc88). We
> could definitely consider changing the value, but this particular
> thing is using the READ_STREAM_MAINTENANCE flag, so it uses
> maintenance_io_concurrency, and that one defaults to 10. That's also
> arbitrary and quite small, but I think it means we can avoid choosing
> new defaults for now :-)

I completely lost track which hardware is supposed to be a good fit for
low/high values of these GUCs. But weren't old spinning drives what
needed fairly long queue to optimize the movement of heads? And IIRC
some of the newer SSD (e.g. Optane) were marketed as not requiring very
long queues ... anyway, it seems fairly difficult to formulate a rule
comprehensible for our users.

FWIW the benchmarking script tweaks both effective_io_concurrency and
maintenance_io_concurrency GUCs (sets them to the same value). But yeah,
10 seems like a much better default for this type of storage. So for
vacuum this is probably fine, but I was thinking more about the regular
effective_io_concurrency for queries.

> For the non-maintenance one, we might want
> to think about tweaking that in the context of bitmap heapscan?
>

I'm not sure what you mean. How would we tweak it? You mean the GUC, or
some sort of adaptive heuristics?

> (Interestingly, MySQL seems to have a related setting defaulting to
> 10k, but that may be a system-wide setting, IDK. Our settings are
> quite low level, per "operation", or really per stream. In an
> off-list chat with Robert and Andres, we bounced around some new
> names, and the one I liked best was io_concurrency_per_stream. It
> would be accurate but bring a new implementation detail to the UX. I
> actually like that about it: it's like
> max_parallel_workers_per_gather, and also just the way work_mem is
> really work_mem_per_<something>: yes it is low level but is an honest
> expression of how (un)sophisticated our resource usage controls are
> today. Perhaps we'll eventually figure out how to balance all
> resources dynamically from global limits...)
>

Possibly, but I'd guess we're years from doing that. I'm not sure anyone
even proposed anything like that.

>> * Why are we limiting ioc to <= 256kB? Per the benchmark it seems it
>> might be beneficial to set even higher values.
>
> That comes from:
>
> #define MAX_IO_COMBINE_LIMIT PG_IOV_MAX
>
> ... which comes from:
>
> /* Define a reasonable maximum that is safe to use on the stack. */
> #define PG_IOV_MAX Min(IOV_MAX, 32)
>
> There are a few places that use either PG_IOV_MAX or
> MAX_IO_COMBINE_LIMIT to size a stack array, but those should be OK
> with a bigger number as the types are small: buffer, pointer, or
> struct iovec (16 bytes). For example, if it were 128 then we could do
> 1MB I/O, a nice round number, and arrays of those types would still
> only be 2kB or less. The other RDBMSes I googled seem to have max I/O
> sizes of around 512kB or 1MB, some tunable explicitly, some derived
> from other settings.
>
> I picked 128kB as the default combine limit because it comes up in
> lots of other places eg readahead size, and seemed to work pretty
> well, and is definitely supportable on all POSIX systems (see POSIX
> IOV_MAX, which is allowed to be as low as 16).

Not sure I follow. Surely if a system can't support values above some
limit, it would define IOV_MAX accordingly, and we'd just reject that
value. And as you point out, the IOV_MAX may be as low as 16, so it's
already possible to get a GUC value that gets rejected on some systems
(even if it's just a theoretical issue).

> I chose a maximum that
> was just a bit more, but not much more because I was also worried
> about how many places would eventually finish up wasting a lot of
> memory by having to multiply that by
> number-of-possible-I/Os-in-progress or other similar ideas, but I was
> expecting we'd want to increase it. read_stream.c doesn't do that
> sort of multiplication itself, but you can see a case like that in
> Andres's AIO patchset here:
>
> https://github.com/anarazel/postgres/blob/aio-2/src/backend/storage/aio/aio_init.c
>
> So for example if io_max_concurrency defaults to 1024; you'd get 2MB
> of iovecs in shared memory with PG_IOV_MAX of 128, instead of 512kB
> today. We can always discuss defaults separately but that case
> doesn't seem like a problem from here...
>

Yeah, all of this makes sense. I don't doubt this is a tradeoff, and if
the GUC gets set to a high value it might have detrimental effect.
Still, 256kB seems a bit too conservative and "not round" ;-)

regards

--
Tomas Vondra

In response to

Re: Confine vacuum skip logic to lazy_scan_skip at 2025-01-18 21:31:25 from Thomas Munro

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Tom Lane	2025-01-18 22:25:11	Re: [PATCH] immediately kill psql process if server is not running.
Previous Message	Thomas Munro	2025-01-18 21:45:59	Re: Confine vacuum skip logic to lazy_scan_skip