Re: AIO v2.0

From: Andres Freund <andres(at)anarazel(dot)de>
To: Noah Misch <noah(at)leadboat(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, 陈宗志 <baotiao(at)gmail(dot)com>
Subject: Re: AIO v2.0
Date: 2024-09-30 14:49:17
Message-ID: sazl7yyvaae23dysaedc62pu3zfvpc3bytaaqy5lk2sec3cmca@w4gt3tjs2tso
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2024-09-17 11:08:19 -0700, Noah Misch wrote:
> > - I am worried about the need for bounce buffers for writes of checksummed
> > buffers. That quickly ends up being a significant chunk of memory,
> > particularly when using a small shared_buffers with a higher than default
> > number of connection. I'm currently hacking up a prototype that'd prevent us
> > from setting hint bits with just a share lock. I'm planning to start a
> > separate thread about that.
>
> AioChooseBounceBuffers() limits usage to 256 blocks (2MB) per MaxBackends.
> Doing better is nice, but I don't consider this a blocker. I recommend
> dealing with the worry by reducing the limit initially (128 blocks?). Can
> always raise it later.

On storage that has nontrivial latency, like just about all cloud storage,
even 256 will be too low. Particularly for checkpointer.

Assuming 1ms latency - which isn't the high end of cloud storage latency - 256
blocks in flight limits you to <= 256MByte/s, even on storage that can have a
lot more throughput. With 3ms, which isn't uncommon, it's 85MB/s.

Of course this could be addressed by tuning, but it seems like something that
shouldn't need to be tuned by the majority of folks running postgres.

We also discussed the topic at https://postgr.es/m/20240925020022.c5.nmisch%40google.com
> ... neither BM_SETTING_HINTS nor keeping bounce buffers looks like a bad
> decision. From what I've heard so far of the performance effects, if it were
> me, I would keep the bounce buffers. I'd pursue BM_SETTING_HINTS and bounce
> buffer removal as a distinct project after the main AIO capability. Bounce
> buffers have an implementation. They aren't harming other design decisions.
> The AIO project is big, so I'd want to err on the side of not designating
> other projects as its prerequisites.

Given the issues that modifying pages while in flight causes, not just with PG
level checksums, but also filesystem level checksum, I don't feel like it's a
particularly promising approach.

However, I think this doesn't have to mean that the BM_SETTING_HINTS stuff has
to be completed before we can move forward with AIO. If I split out the write
portion from the read portion a bit further, the main AIO changes and the
shared-buffer read user can be merged before there's a dependency on the hint
bit stuff being done.

Does that seem reasonable?

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Maxim Orlov 2024-09-30 14:49:36 Do not lock temp relations
Previous Message Yugo NAGATA 2024-09-30 14:18:39 Re: Doc: typo in config.sgml