Re: Converting SetOp to read its two inputs separately

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: David Rowley <dgrowleyml(at)gmail(dot)com>
Cc: Richard Guo <guofenglinux(at)gmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Converting SetOp to read its two inputs separately
Date: 2024-11-29 19:44:25
Message-ID: 534399.1732909465@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

David Rowley <dgrowleyml(at)gmail(dot)com> writes:
> Here's a quick review of all 5 patches together.

Thanks for looking at it!

> 1. In setop_load_group(), the primary concern with the result of
> setop_compare_slots() seems to be if the tuples match or not. If
> you're not too concerned with keeping the Assert checking for
> mis-sorts, then it could be much more efficient for many cases to
> check the final column first.

Hm. setop_compare_slots() has to deliver a 3-way compare result for
its usage in setop_retrieve_sorted(), where we're actually doing
a merge. We could potentially use a different subroutine within
setop_load_group(), but I'm not sure it's worth added complexity.

> 2. (related to #1) I'm unsure if it's also worth deforming all the
> needed attributes in one go rather than calling slot_getattr() each
> time, which will result in incrementally deforming the tuple.

Good point; probably worth doing given that the low-order
column is often going to determine the result.

>> /* XXX hack: force the tuple into minimal form */
>> /* XXX should not have to do this */

> nodeAgg.c seems to do this by using prepare_hash_slot() which deforms
> the heap tuple and sets the Datums verbatim rather than making copies
> of any byref ones.

I'll take a look at that, thanks for the pointer.

> 4. Since the EXPLAIN output is going to change for SetOps, what are
> your thoughts on being more explicit about the setop strategy being
> used? Showing "SetOp Except" isn't that informative. You've got to
> look at the non-text format to know it's using the merge method. How
> about "MergeSetOp Except"?

I'm confused here. explain.c already uses "SetOp" versus "HashSetOp"
to show the strategy. Are you saying we should instead print
"MergeSetOp" and "HashSetOp"? That seems like change for the sake
of change, really.

> I'm also happy if you just push the 0004 patch. Are you planning to
> backpatch that?

I've been debating whether to propose a back-patch of that. It's
clearly a bug fix in the sense that the current code isn't doing
what was intended. On the other hand, it isn't (I think) resulting
in any wrong answers, just plans that are a bit less efficient
than they could be, and perhaps some planning time wasted on making
Paths that can't be used. And generally we refrain from making
unnecessary changes of plan choices in released branches, because
users value plan stability. So on the whole I'm leaning towards
not back-patching, but I could be persuaded otherwise if anyone
feels strongly that we should.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Masahiko Sawada 2024-11-29 19:45:38 Re: Count and log pages set all-frozen by vacuum
Previous Message Thomas Munro 2024-11-29 19:30:41 Re: Cannot find a working 64-bit integer type on Illumos