From: | Thomas Munro <thomas(dot)munro(at)gmail(dot)com> |
---|---|
To: | pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Cc: | Andres Freund <andres(at)anarazel(dot)de> |
Subject: | Re: Some read stream improvements |
Date: | 2025-02-17 05:57:46 |
Message-ID: | CA+hUKGJ43PGTH=69=d=c=f8cwGeug548pMxN-jQ6_CohGayUQA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Mon, Feb 17, 2025 at 5:55 PM Thomas Munro <thomas(dot)munro(at)gmail(dot)com> wrote:
> The solution we agreed on is to introduce a way for StartReadBuffers()
> to communicate with future calls, and "forward" pinned buffers between
> calls. The function arguments don't change, but its "buffers"
> argument becomes an in/out array: one StartReadBuffers() call can
> leave extra pinned buffers after the ones that were included in a
> short read (*nblocks), and then when you retry (or possibly extend)
> the rest of the read, you have to pass them back in. That is easy for
> the read stream code, as it can just leave them in its circular queue
> for the next call to take as input. It only needs to be aware of them
> for pin limit accounting and stream reset (including early shutdown).
BTW here's a small historical footnote about that: I had another
solution to the same problem in the original stream proposal[1], which
used a three-step bufmgr API. You had to call PrepareReadBuffer() for
each block you intended to read, and then StartReadBuffers() for each
cluster of adjacent misses it reported, and finally WaitReadBuffers().
Hits would require only the first step. That was intended to allow
the stream to manage the prepared buffers itself, short reads would
leave some of them for a later call. Based on review feedback, I
simplified to arrive at the two-step API, ie just start and wait, and
PrepareReadBuffer() became the private helper function
PinBufferForBlock(). I had to invent the "hide-the-trailing-hit"
trick to avoid unpin/repin sequence in the two-step API, thinking we
might eventually want to consider the three-step API again later.
This new design achieves the same end result: buffers that can't be
part of an I/O stay pinned and in the stream's queue ready for the
next call, just like "prepared" buffers in the early prototypes, but
it keeps the two-step bufmgr API and calls them "forwarded".
From | Date | Subject | |
---|---|---|---|
Next Message | Amit Kapila | 2025-02-17 05:59:24 | Re: Introduce XID age and inactive timeout based replication slot invalidation |
Previous Message | Kirill Reshke | 2025-02-17 05:55:26 | Re: Some read stream improvements |