From: | Melanie Plageman <melanieplageman(at)gmail(dot)com> |
---|---|
To: | Andres Freund <andres(at)anarazel(dot)de>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: heapgettup refactoring |
Date: | 2022-11-04 15:51:03 |
Message-ID: | CAAKRu_ZJg_N7zHtWP+JoSY_hrce4+GKioL137Y2c2En-kuXQ7g@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Thanks for the review!
Attached is v2 with feedback addressed.
On Mon, Oct 31, 2022 at 9:09 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
> > From 9d8b01960463dc64ff5b111d523ff80fce3017af Mon Sep 17 00:00:00 2001
> > From: Melanie Plageman <melanieplageman(at)gmail(dot)com>
> > Date: Mon, 31 Oct 2022 13:40:06 -0400
> > Subject: [PATCH v1 2/3] Turn HeapKeyTest macro into function
> >
> > This should always be inlined appropriately now. It is easier to read as
> > a function. Also, remove unused include in catcache.c.
> > ---
> > src/backend/access/heap/heapam.c | 10 ++--
> > src/backend/utils/cache/catcache.c | 1 -
> > src/include/access/valid.h | 76 ++++++++++++------------------
> > 3 files changed, 36 insertions(+), 51 deletions(-)
> >
> > diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
> > index 12be87efed..1c995faa12 100644
> > --- a/src/backend/access/heap/heapam.c
> > +++ b/src/backend/access/heap/heapam.c
> > @@ -716,8 +716,10 @@ heapgettup(HeapScanDesc scan,
> > snapshot);
> >
> > if (valid && key != NULL)
> > - HeapKeyTest(tuple, RelationGetDescr(scan->rs_base.rs_rd),
> > - nkeys, key, valid);
> > + {
> > + valid = HeapKeyTest(tuple, RelationGetDescr(scan->rs_base.rs_rd),
> > + nkeys, key);
> > + }
> >
> > if (valid)
> > {
>
> superfluous parens.
fixed.
> > From a894ce38c488df6546392b9f3bd894b67edf951e Mon Sep 17 00:00:00 2001
> > From: Melanie Plageman <melanieplageman(at)gmail(dot)com>
> > Date: Mon, 31 Oct 2022 13:40:29 -0400
> > Subject: [PATCH v1 3/3] Refactor heapgettup* and heapgetpage
> >
> > Simplify heapgettup(), heapgettup_pagemode(), and heapgetpage(). All
> > three contained several unnecessary local variables, duplicate code, and
> > nested if statements. Streamlining these improves readability and
> > extensibility.
>
> It'd be nice to break this into slightly smaller chunks.
I can do that. Since incorporating feedback will be harder once I break
it up into smaller chunks, I'm inclined to wait to do so until I know
that the structure I have now is the one we will go with. (I know smaller
chunks will make it more reviewable.)
> > +
> > +static inline void heapgettup_no_movement(HeapScanDesc scan)
> > +{
>
> FWIW, for function definitions we keep the return type (and with that also the
> the "static inline") on a separate line.
Fixed
>
> > + ItemId lpp;
> > + OffsetNumber lineoff;
> > + BlockNumber page;
> > + Page dp;
> > + HeapTuple tuple = &(scan->rs_ctup);
> > + /*
> > + * ``no movement'' scan direction: refetch prior tuple
> > + */
> > +
> > + /* Since the tuple was previously fetched, needn't lock page here */
> > + if (!scan->rs_inited)
> > + {
> > + Assert(!BufferIsValid(scan->rs_cbuf));
> > + tuple->t_data = NULL;
> > + return;
>
> Is it possible to have a no-movement scan with an uninitialized scan? That
> doesn't really seem to make sense. At least that's how I understand the
> explanation for NoMovement nearby:
> * dir == NoMovementScanDirection means "re-fetch the tuple indicated
> * by scan->rs_ctup".
>
> We can't have a rs_ctup without an already started scan, no?
>
> Looks like this is pre-existing code that you just moved, but it still seems
> wrong.
Changed to an assert
>
> > + }
> > + page = ItemPointerGetBlockNumber(&(tuple->t_self));
> > + if (page != scan->rs_cblock)
> > + heapgetpage((TableScanDesc) scan, page);
>
>
> We have a
> BlockNumber page;
> and
> Page dp;
> in this code which seems almost intentionally confusing. This again is a
> pre-existing issue but perhaps we could clean it up first?
in attached
page -> block
dp -> page
in basically all locations in heapam.c (should that be its own commit?)
> > +static inline Page heapgettup_continue_page(HeapScanDesc scan, BlockNumber page, ScanDirection dir,
> > + int *linesleft, OffsetNumber *lineoff)
> > +{
> > + HeapTuple tuple = &(scan->rs_ctup);
>
> Hm. Finding the next offset via rs_ctup doesn't seem quite right. For one,
> it's not actually that cheap to extract the offset from an ItemPointer because
> of the the way we pack it into ItemPointerData.
So, it was like this before [1].
What about saving the lineoff in rs_cindex.
It is smaller, but that seems okay, right?
> > + Page dp = BufferGetPage(scan->rs_cbuf);
> > + TestForOldSnapshot(scan->rs_base.rs_snapshot, scan->rs_base.rs_rd, dp);
>
> Newlines between definitions and code :)
k
> Perhaps worth asserting that the scan is initialized and that rs_cbuf is valid?
indeed.
>
> > + if (ScanDirectionIsForward(dir))
> > + {
> > + *lineoff = OffsetNumberNext(ItemPointerGetOffsetNumber(&(tuple->t_self)));
> > + *linesleft = PageGetMaxOffsetNumber(dp) - (*lineoff) + 1;
>
> We can't access PageGetMaxOffsetNumber etc without holding a lock on the
> page. It's not immediately obvious that that is held in all paths.
In heapgettup() I call LockBuffer() before invoking
heapgettup_continue_page() and heapgettup_start_page() which are the
ones doing this.
I did have big plans for using the continue_page and start_page
functions in heapgettup_pagemode() as well, but since I'm not doing that
now, I can add in an expectation that the lock is held.
I added a comment saying the caller is responsible for acquiring the
lock if needed. I thought of adding an assert, but I don't see that
being done outside of bufmgr.c
BufferDesc *bufHdr = GetBufferDescriptor(buffer - 1);
Assert(LWLockHeldByMe(BufferDescriptorGetContentLock(bufHdr)));
> > +static inline BlockNumber heapgettup_initial_page(HeapScanDesc scan, ScanDirection dir)
> > +{
> > + Assert(!ScanDirectionIsNoMovement(dir));
> > + Assert(!scan->rs_inited);
>
> Is there a reason we couldn't set rs_inited in here, rather than reapeating
> that in all callers?
I wasn't sure if future callers or existing callers in the future may
need to do steps other than what is in heapgettup_initial_page() before
setting rs_inited. I thought of the responsibility of
heapgettup_initial_page() as returning the initial page to start the
scan. If it is going to do all initialization steps, perhaps the name
should change? I thought having a function that says it does
initialization of the scan might be confusing since initscan() also
exists.
> ISTM that this function should deal with the
> /*
> * return null immediately if relation is empty
> */
>
> logic, I think you now are repeating that check on every call to heapgettup().
So, that's a good point. If I move setting rs_inited inside of
heapgettup_initial_page(), then I can also easily move the empty table
check inside there too.
I don't want to set rs_inited before every return in
heapgettup_initial_page(). Do you think there are any issues with
setting it at the top of the function?
I thought about setting it at the very top (even before checking if the
relation is empty) Is it okay to set it before the empty table check?
rs_inited will be set to false at the bottom before returning. But,
maybe this will be an issue in other callers of
heapgettup_initial_page()?
Anyway, I have changed it in attached v2.
> > @@ -511,182 +711,55 @@ heapgettup(HeapScanDesc scan,
> > ScanKey key)
> > {
> > HeapTuple tuple = &(scan->rs_ctup);
> > - Snapshot snapshot = scan->rs_base.rs_snapshot;
> > - bool backward = ScanDirectionIsBackward(dir);
> > BlockNumber page;
> > - bool finished;
> > Page dp;
> > - int lines;
> > OffsetNumber lineoff;
> > int linesleft;
> > - ItemId lpp;
> > +
> > + if (ScanDirectionIsNoMovement(dir))
> > + return heapgettup_no_movement(scan);
>
> Maybe add an unlikely() - this path is barely ever used...
done.
> > + page = scan->rs_cblock;
> > + LockBuffer(scan->rs_cbuf, BUFFER_LOCK_SHARE);
> > + dp = heapgettup_continue_page(scan, page, dir, &linesleft, &lineoff);
> > + goto continue_page;
> > }
> >
> > /*
> > * advance the scan until we find a qualifying tuple or run out of stuff
> > * to scan
> > */
> > - lpp = PageGetItemId(dp, lineoff);
> > - for (;;)
> > + while (page != InvalidBlockNumber)
> > {
> > + heapgetpage((TableScanDesc) scan, page);
> > + LockBuffer(scan->rs_cbuf, BUFFER_LOCK_SHARE);
> > + dp = heapgettup_start_page(scan, page, dir, &linesleft, &lineoff);
> > + continue_page:
>
>
> I don't like the goto continue_page at all. Seems that the paths leading here
> should call LockBuffer(), heapgettup_start_page() etc? Possibly a do {} while
> () loop could do the trick as well.
I don't see how a do while loop would solve help with the problem.
We need to check if the block number is valid after getting a block
assignment before doing heapgetpage() (e.g. after
heapgettup_initial_page() and after heapgettup_advance_page()).
Removing the goto continue_page means adding the heapgettpage(),
heapgettup_start_page(), etc code block in two places now (both after
heapgettup_initial_page() and after heapgettup_advance_page()) and, in
both locations we have to add an if statement to check if the block is
valid. I feel like this makes the function longer and harder to
understand. Keeping the loop as short as possible makes it clear what it
is doing. I think that with an appropriate warning comment, the goto
continue_page is clearer and easier to understand. To me, starting a
page at the top of the outer loop, then looping through the lines in the
page and is the structure that makes the most sense.
- Melanie
[1] https://github.com/postgres/postgres/blob/master/src/backend/access/heap/heapam.c#L572
Attachment | Content-Type | Size |
---|---|---|
v2-0001-Remove-breaks-in-HeapTupleSatisfiesVisibility.patch | application/octet-stream | 1.4 KB |
v2-0002-Turn-HeapKeyTest-macro-into-function.patch | application/octet-stream | 3.7 KB |
v2-0003-Refactor-heapgettup-and-heapgetpage.patch | application/octet-stream | 32.2 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Jacob Champion | 2022-11-04 15:58:34 | Re: Add non-blocking version of PQcancel |
Previous Message | Peter Eisentraut | 2022-11-04 15:45:35 | Re: psql: Add command to use extended query protocol |