Re: Converting SetOp to read its two inputs separately

From: David Rowley <dgrowleyml(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Richard Guo <guofenglinux(at)gmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Converting SetOp to read its two inputs separately
Date: 2024-11-29 05:37:48
Message-ID: CAApHDvowLG3wd-5hXdtS++7NfGYYGdwVGfZi8Hncm6p0MvZBHw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, 20 Nov 2024 at 15:09, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Once I'd wrapped my head around how things are done now (which the
> comments in prepunion.c were remarkably unhelpful about), I saw that
> most of the problem for #2 just requires re-ordering things that
> generate_nonunion_paths was already doing. As for #1, I have a modest
> proposal: we should get rid of set_operation_ordered_results_useful
> entirely. It's not the code that actually does useful work, and
> keeping it in sync with the code that does do useful work is hard and
> unnecessary.
>
> 0001-0003 below are the same as before (so the slot-munging TODO is
> still there). 0004 fixes a rather basic bug for nested set-operations
> and gets rid of set_operation_ordered_results_useful along the way.
> Then 0005 does your step 2.

Here's a quick review of all 5 patches together.

1. In setop_load_group(), the primary concern with the result of
setop_compare_slots() seems to be if the tuples match or not. If
you're not too concerned with keeping the Assert checking for
mis-sorts, then it could be much more efficient for many cases to
check the final column first. ExecBuildGroupingEqual() makes use of
this knowledge already, so it's nothing new. Maybe you could just use
an ExprState made by ExecBuildGroupingEqual() instead of
setop_compare_slots() for this case. That would allow JIT to work.

2. (related to #1) I'm unsure if it's also worth deforming all the
needed attributes in one go rather than calling slot_getattr() each
time, which will result in incrementally deforming the tuple.
ExecBuildGroupingEqual() also does that.

3.
/* XXX hack: force the tuple into minimal form */
/* XXX should not have to do this */
ExecForceStoreHeapTuple(ExecCopySlotHeapTuple(innerslot),
setopstate->ps.ps_ResultTupleSlot, true);
innerslot = setopstate->ps.ps_ResultTupleSlot;

nodeAgg.c seems to do this by using prepare_hash_slot() which deforms
the heap tuple and sets the Datums verbatim rather than making copies
of any byref ones.

4. Since the EXPLAIN output is going to change for SetOps, what are
your thoughts on being more explicit about the setop strategy being
used? Showing "SetOp Except" isn't that informative. You've got to
look at the non-text format to know it's using the merge method. How
about "MergeSetOp Except"?

5. You might not be too worried, but in SetOpState if you move
need_init up to below setop_done, you'll close a 7-byte hole in that
struct

I'm also happy if you just push the 0004 patch. Are you planning to
backpatch that? If I'd realised you were proposing to remove that
function, I'd not have fixed the typo Richard mentioned.

David

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Zharkov Roman 2024-11-29 05:39:52 Re: plperl version on the meson setup summary screen
Previous Message Andrei Lepikhov 2024-11-29 05:10:42 Re: POC, WIP: OR-clause support for indexes