Re: AIO writes vs hint bits vs checksums

From: Noah Misch <noah(at)leadboat(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: pgsql-hackers(at)postgresql(dot)org, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Robert Haas <robertmhaas(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Subject: Re: AIO writes vs hint bits vs checksums
Date: 2024-09-24 19:43:40
Message-ID: 20240924194340.92.nmisch@google.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Sep 24, 2024 at 11:55:08AM -0400, Andres Freund wrote:
> So far the AIO patchset has solved this by introducing a set of "bounce
> buffers", which can be acquired and used as the source/target of IO when doing
> it in-place into shared buffers isn't viable.
>
> I am worried about that solution however, as either acquisition of bounce
> buffers becomes a performance issue (that's how I did it at first, it was hard
> to avoid regressions) or we reserve bounce buffers for each backend, in which
> case the memory overhead for instances with relatively small amount of
> shared_buffers and/or many connections can be significant.

> But: We can address this and improve performance over the status quo! Today we
> determine tuple visiblity determination one-by-one, even when checking the
> visibility of an entire page worth of tuples. That's not exactly free. I've
> prototyped checking visibility of an entire page of tuples at once and it
> indeed speeds up visibility checks substantially (in some cases seqscans are
> over 20% faster!).

Nice! It sounds like you refactored the relationship between
heap_prepare_pagescan() and HeapTupleSatisfiesVisibility() to move the hint
bit setting upward or the iterate-over-tuples downward. Is that about right?

> Once we have page-level visibility checks we can get the right to set hint
> bits once for an entire page instead of doing it for every tuple - with that
> in place the "new approach" of setting hint bits only with BM_SETTING_HINTS
> wins.

How did page-level+BM_SETTING_HINTS performance compare to performance of the
page-level change w/o the BM_SETTING_HINTS change?

> Having a page level approach to setting hint bits has other advantages:
>
> E.g. today, with wal_log_hints, we'll log hint bits on the first hint bit set
> on the page and we don't mark a page dirty on hot standby. Which often will
> result in hint bits notpersistently set on replicas until the page is frozen.

Nice way to improve that.

> Does this sound like a reasonable idea? Counterpoints?

I guess the main part left to discuss is index scans or other scan types where
we'd either not do page-level visibility or we'd do page-level visibility
including tuples we wouldn't otherwise use. BM_SETTING_HINTS likely won't
show up so readily in index scan profiles, but the cost is still there. How
should we think about comparing the distributed cost of the buffer header
manipulations during index scans vs. the costs of bounce buffers?

Thanks,
nm

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2024-09-24 19:58:21 Re: Possible null pointer dereference in afterTriggerAddEvent()
Previous Message Shayon Mukherjee 2024-09-24 19:38:08 Re: Proposal to Enable/Disable Index using ALTER INDEX