pgsql: Allow ReadStream to be consumed as raw block numbers.

From: Thomas Munro <tmunro(at)postgresql(dot)org>
To: pgsql-committers(at)lists(dot)postgresql(dot)org
Subject: pgsql: Allow ReadStream to be consumed as raw block numbers.
Date: 2024-09-17 23:35:45
Message-ID: E1sqhjU-001ZFr-RF@gemulon.postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-committers

Allow ReadStream to be consumed as raw block numbers.

Commits 041b9680 and 6377e12a changed the interface of
scan_analyze_next_block() to take a ReadStream instead of a BlockNumber
and a BufferAccessStrategy, and to return a value to indicate when the
stream has run out of blocks.

This caused integration problems for at least one known extension that
uses specially encoded BlockNumber values that map to different
underlying storage, because acquire_sample_rows() sets up the stream so
that read_stream_next_buffer() reads blocks from the main fork of the
relation's SMgrRelation.

Provide read_stream_next_block(), as a way for such an extension to
access the stream of raw BlockNumbers directly and forward them to its
own ReadBuffer() calls after decoding, as it could in earlier releases.
The new function returns the BlockNumber and BufferAccessStrategy that
were previously passed directly to scan_analyze_next_block().
Alternatively, an extension could wrap the stream of BlockNumbers in
another ReadStream with a callback that performs any decoding required
to arrive at real storage manager BlockNumber values, so that it could
benefit from the I/O combining and concurrency provided by
read_stream.c.

Another class of table access method that does nothing in
scan_analyze_next_block() because it is not block-oriented could use
this function to control the number of block sampling loops. It could
match the previous behavior with "return read_stream_next_block(stream,
&bas) != InvalidBlockNumber".

Ongoing work is expected to provide better ANALYZE support for table
access methods that don't behave like heapam with respect to storage
blocks, but that will be for future releases.

Back-patch to 17.

Reported-by: Mats Kindahl <mats(at)timescale(dot)com>
Reviewed-by: Mats Kindahl <mats(at)timescale(dot)com>
Discussion: https://postgr.es/m/CA%2B14425%2BCcm07ocG97Fp%2BFrD9xUXqmBKFvecp0p%2BgV2YYR258Q%40mail.gmail.com

Branch
------
master

Details
-------
https://git.postgresql.org/pg/commitdiff/70d38e3d8a2d2cb88e3add2b90a122dacc941aa4

Modified Files
--------------
src/backend/storage/aio/read_stream.c | 14 ++++++++++++++
src/include/storage/read_stream.h | 2 ++
2 files changed, 16 insertions(+)

Browse pgsql-committers by date

  From Date Subject
Next Message Thomas Munro 2024-09-17 23:35:54 pgsql: Allow ReadStream to be consumed as raw block numbers.
Previous Message Alexander Korotkov 2024-09-17 19:59:49 Re: pgsql: pg_upgrade: Parallelize retrieving relation information.