From: | Melanie Plageman <melanieplageman(at)gmail(dot)com> |
---|---|
To: | Heikki Linnakangas <hlinnaka(at)iki(dot)fi> |
Cc: | Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Nazir Bilal Yavuz <byavuz81(at)gmail(dot)com>, Tomas Vondra <tv(at)fuzzy(dot)cz> |
Subject: | Re: BitmapHeapScan streaming read user and prelim refactoring |
Date: | 2024-10-15 00:34:24 |
Message-ID: | CAAKRu_Zk7dbg4T8ukCAwA=qu9XxAk9DN-oEcmBPQ+2KFscQ+-g@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Mon, Aug 26, 2024 at 9:37 AM Heikki Linnakangas <hlinnaka(at)iki(dot)fi> wrote:
>
> v23-0003-Make-table_scan_bitmap_next_block-async-friendly.patch
>
> > @@ -1955,19 +1954,26 @@ table_relation_estimate_size(Relation rel, int32 *attr_widths
>>
> > + * `blockno` is only used in bitmap table scan code to validate that the
> > + * prefetch block is staying ahead of the current block.
> > *
> > * Note, this is an optionally implemented function, therefore should only be
> > * used after verifying the presence (at plan time or such).
> > */
> > static inline bool
> > table_scan_bitmap_next_block(TableScanDesc scan,
> > - struct TBMIterateResult *tbmres,
> > + BlockNumber *blockno, bool *recheck,
> > long *lossy_pages,
> > long *exact_pages)
> > {
>
> The new comment doesn't really explain what *blockno means. Is it an
> input or output parameter? How is it used with the prefetching?
I've updated this comment.
> v23-0004-Add-common-interface-for-TBMIterators.patch
>
> > +void
> > +tbm_begin_iterate(TBMIterator *iterator, TIDBitmap *tbm,
> > + dsa_area *dsa, dsa_pointer dsp)
> > +{
> > + Assert(iterator);
> > +
> > + iterator->private_iterator = NULL;
> > + iterator->shared_iterator = NULL;
> > + iterator->exhausted = false;
> > +
> > + /* Allocate a private iterator and attach the shared state to it */
> > + if (DsaPointerIsValid(dsp))
> > + iterator->shared_iterator = tbm_attach_shared_iterate(dsa, dsp);
> > + else
> > + iterator->private_iterator = tbm_begin_private_iterate(tbm);
> > +}
>
> Hmm, I haven't looked at how this is used the later patches, but a
> function signature where some parameters are used or not depending on
> the situation seems a bit awkward. Perhaps it would be better to let the
> caller call tbm_attach_shared_iterate(dsa, dsp) or
> tbm_begin_private_iterate(tbm), and provide a function to turn that into
> a TBMIterator? Something like:
>
> TBMIterator *tbm_iterator_from_shared_iterator(TBMSharedIterator *);
> TBMIterator *tbm_iterator_from_private_iterator(TBMPrivateIterator *);
I thought about this a lot. I've changed the struct a bit but have
left the tbm_begin_iterate() interface mostly the same. I actually
think having some parameters that are used only for parallel and some
only for serial is better than exposing the private/shared versions of
the iterators only for tbm_begin_iterate() (which I would have to do
if the caller calls
tbm_attach_shared_iterate()/tbm_begin_private_iterate() directly).
> Does tbm_iterator() really need the 'exhausted' flag? The private and
> shared variants work without one.
I got rid of this.
> v23-0006-BitmapHeapScan-uses-unified-iterator.patch
>
> Makes sense. (Might be better to squash this with the previous patch)
Done.
> v23-0008-Push-current-scan-descriptor-into-specialized-sc.patch
> v23-0009-Remove-ss_current-prefix-from-ss_currentScanDesc.patch
>
> LGTM. I would squash these together.
I wanted to make a change that was lower impact than a new top level
scan descriptor. So, I made a few changes to the design to allow that.
Those rendered v23-0008 and v23-0009 moot.
See v24-0003 and v24-0007 for the new design. I could use input on
whether or not this strategy seems acceptable.
> v23-0010-Add-scan_in_progress-to-BitmapHeapScanState.patch
>
> > --- a/src/include/nodes/execnodes.h
> > +++ b/src/include/nodes/execnodes.h
> > @@ -1804,6 +1804,7 @@ typedef struct ParallelBitmapHeapState
> > * prefetch_target current target prefetch distance
> > * prefetch_maximum maximum value for prefetch_target
> > * initialized is node is ready to iterate
> > + * scan_in_progress is this a rescan
> > * pstate shared state for parallel bitmap scan
> > * recheck do current page's tuples need recheck
> > * blockno used to validate pf and current block in sync
> > @@ -1824,6 +1825,7 @@ typedef struct BitmapHeapScanState
> > int prefetch_target;
> > int prefetch_maximum;
> > bool initialized;
> > + bool scan_in_progress;
> > ParallelBitmapHeapState *pstate;
> > bool recheck;
> > BlockNumber blockno;
>
> Hmm, the "is this a rescan" comment sounds inaccurate, because it is set
> as soon as the scan is started, not only when rescanning. Other than
> that, LGTM.
It turns out I didn't need scan_in_progress once I rearranged some
things anyway.
- Melanie
From | Date | Subject | |
---|---|---|---|
Next Message | Masahiko Sawada | 2024-10-15 01:08:10 | Re: Add contrib/pg_logicalsnapinspect |
Previous Message | Bruce Momjian | 2024-10-14 23:41:35 | Re: Jargon and acronyms on this mailing list |