Re: Joins on TID

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Edmund Horner <ejrh00(at)gmail(dot)com>
Cc: PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Joins on TID
Date: 2019-01-01 21:00:38
Message-ID: 12958.1546376438@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Edmund Horner <ejrh00(at)gmail(dot)com> writes:
> On Sat, 22 Dec 2018 at 12:34, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> I decided to spend an afternoon seeing exactly how much work would be
>> needed to support parameterized TID scans, ie nestloop-with-inner-TID-
>> scan joins, as has been speculated about before, most recently here:
>> ...

> It seems good, and I can see you've committed it now. (I should have
> commented sooner, but it's the big summer holiday period here, which
> means I have plenty of time to work on PostgreSQL, but none of my
> usual resources. In any case, I was going to say "this looks useful
> and not too complicated, please go ahead".)

OK.

> I did notice that multiple tidquals are no longer removed from scan_clauses:
> EXPLAIN SELECT * FROM pg_class WHERE ctid = '(1,1)' OR ctid = '(2,2)';
> Tid Scan on pg_class (cost=0.01..8.03 rows=2 width=265)
> TID Cond: ((ctid = '(1,1)'::tid) OR (ctid = '(2,2)'::tid))
> Filter: ((ctid = '(1,1)'::tid) OR (ctid = '(2,2)'::tid))

I fixed that in the committed version, I believe. (I'd been
overoptimistic about whether logic could be removed from
create_tidscan_plan.)

>> I haven't really looked at how much of a merge problem there'll be
>> with Edmund Horner's work for TID range scans. My feeling about it
>> is that we might be best off treating that as a totally separate
>> code path, because the requirements are significantly different (for
>> instance, a range scan needs AND semantics not OR semantics for the
>> list of quals to apply).

> Well, I guess it's up to me to merge it. I can't quite see which
> parts we'd use a separate code path for. Can you elaborate?

The thing that's bothering me is something I hadn't really focused on
before, but which looms large now that I've thought about it: the
TID-quals list of a TidPath or TidScan has OR semantics, viz it can
directly represent

ctid = this OR ctid = that OR ctid = the_other

as a list of tideq OpExprs. But what you want for a range scan on
TID is implicit-AND, because you might have either a one-sided
condition, say

ctid >= this

or a range condition

ctid >= this AND ctid <= that

I see that what you've done to make this sort-of work in the existing
patch is to insist that a range scan have just one member at the OR-d list
level and that has to be an AND'ed sublist, but TBH I think that's a mess;
for instance I wonder whether the code works correctly if faced with cases
like

ctid >= this OR ctid <= that

I don't think it's at all practical to have tidpath.c dealing with both
cases in one scan of the quals --- even if you can make it work, it'll be
unreasonably complicated and hard to understand. I'd be inclined to just
have it thumb through the restrictinfo or joininfo list a second time,
looking for inequalities, and build a path for that case separately.

I suspect that on the whole, you'd be better off treating the range-scan
case as completely separate, with a different Path type and different
Plan type too (ie, separate executor support). Yes, this would involve
some duplication of support code, but I think the end result would be
a lot cleaner and easier to understand.

regards, tom lane

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Nikolay Shaplov 2019-01-01 21:12:22 Re: [PATCH][PROPOSAL] Add enum releation option type
Previous Message Nikolay Shaplov 2019-01-01 20:21:45 Re: [PATCH] get rid of StdRdOptions, use individual binary reloptions representation for each relation kind instead