From: | Andres Freund <andres(at)anarazel(dot)de> |
---|---|
To: | Jakub Wartak <jakub(dot)wartak(at)enterprisedb(dot)com> |
Cc: | pgsql-hackers(at)postgresql(dot)org, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Robert Haas <robertmhaas(at)gmail(dot)com>, Noah Misch <noah(at)leadboat(dot)com> |
Subject: | Re: AIO v2.3 |
Date: | 2025-02-10 23:10:46 |
Message-ID: | zjtr5bie37zeegtkmyckncmkboc6gaekmvlpmzuv66hgnjtnih@q5ozo7qnlt2q |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
On 2025-02-06 11:50:04 +0100, Jakub Wartak wrote:
> Hi Andres, OK, so I've hastily launched AIO v2.3 (full, 29 patches)
> patchset probe run before going for short vacations and here results
> are attached*.
Thanks for doing that work!
> TLDR; in terms of SELECTs the master vs aioworkers looks very solid!
Phew! Weee! Yay.
> I was kind of afraid that additional IPC to separate processes would put
> workers at a disadvantage a little bit , but that's amazingly not true.
It's a measurable disadvantage, it's just more than counteracted by being able
to do IO asynchronously :).
It's possible to make it more visible, by setting io_combine_limit = 1. If you
have a small shared buffers with everything in the kernel cache, the dispatch
overhead starts to be noticeable above several GB/s. But that's ok, I think.
> The intention of this effort just to see if committing AIO with defaults as
> it stands is good enough to not cause basic regressions for users and to me
> it looks like it is nearly finished :)).
That's really good to hear. I think we can improve things a lot in the
future, but we gotta start somewhere...
> 1. not a single crash was observed , but those were pretty short runs
>
> 2. my very limited in terms of time data analysis thoughts
> - most of the time perf with aioworkers is identical (+/- 3%) as of
> the master, in most cases it is much BETTER
I assume s/most/some/ for the second most?
> - on parallel seqscans "sata" with datasets bigger than VFS-cache
> ("big") and high e_io_c with high client counts(sigh!), it looks like
> it would user noticeable big regression but to me it's not regression
> itself, probably we are issuing way too many posix_fadvise()
> readaheads with diminishing returns. Just letting you know. Not sure
> it is worth introducing some global (shared aioworkers e_io_c
> limiter), I think not. I think it could also be some maintenance noise
> on that I/O device, but I have no isolated SATA RAID10 with like 8x
> HDDs in home to launch such a test to be absolutely sure.
I'm inclined to not introduce a global limit for now - it's pretty hard to
make that scale to fast IO devices, so you need a multi-level design, where
each backend can issue a few IOs without consulting the global limit and only
after that you need to get the right to issue even more IOs from the shared
"pool".
I think this is basically a configuration issue - configuring a high e_io_c
for a device that can't handle that and then load it up with a lot of clients,
well, that'll not work out great.
> 3. with aioworkers in documentation it would worth pointing out that
> `iotop` won't be good enough to show which PID is doing I/O anymore .
> I've often get question like this: who is taking the most of I/O right
> now because storage is fully saturated on multi-use system. Not sure
> it would require new view or not (pg_aios output seems to be not more
> like in-memory debug view that would be have to be sampled
> aggressively, and pg_statio_all_tables shows well table, but not PID
> -- same for pg_stat_io). IMHO if docs would be simple like
> "In order to understand which processes (PIDs) are issuing lots of
> IOs, please check pg_stat_activty for *IO/AioCompletion* waits events"
> it should be good enough for a start.
pg_stat_get_backend_io() should allow to answer that, albeit with the usual
weakness of our stats system, namely that the user has to diff two snapshots
themselves. It probably also has the weakness of not showing results for
queries before they've finished, although I think that's something we should
be able to improve without too much trouble (not in this release though, I
suspect).
I guess we could easily reference pg_stat_get_backend_io(), but a more
complete recipe isn't entirely trivial...
> Bench machine: it was intentionally much smaller hardware. Azure's
> Lsv2 L8s_v2 (1st gen EPYC/1s4c8t, with kernel 6.10.11+bpo-cloud-amd64
> and booted with mem=12GB that limited real usable RAM memory to just
> like ~8GB to stress I/O). liburing 2.9. Normal standard compile
> options were used without asserts (such as normal users would use).
Good - the asserts in the aio patches are a bit more noticeable than the ones
in master.
Thanks again for running these!
Greetings,
Andres Freund
From | Date | Subject | |
---|---|---|---|
Next Message | Jacob Champion | 2025-02-10 23:19:46 | Re: dblink: Add SCRAM pass-through authentication |
Previous Message | Andres Freund | 2025-02-10 22:54:22 | Re: AIO v2.3 |