Re: Condition pushdown: why (=) is pushed down into join, but BETWEEN or >= is not?

From: Andy Fan <zhihui(dot)fan1213(at)gmail(dot)com>
To: Justin Pryzby <pryzby(at)telsasoft(dot)com>
Cc: David Rowley <dgrowleyml(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Dmitry Astapov <dastapov(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Condition pushdown: why (=) is pushed down into join, but BETWEEN or >= is not?
Date: 2022-02-02 07:37:13
Message-ID: CAKU4AWrecbiuXaF4rVTji4=ow9VL-WBoZdjSCyQSDVmcgTL=Vw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Justin:

Thanks for your attention.

On Wed, Feb 2, 2022 at 1:13 AM Justin Pryzby <pryzby(at)telsasoft(dot)com> wrote:

> > Subject: [PATCH v1 1/6] Rebaee David's patch against the latest code.
>
> If you use git-am, then the author/commit information is preserved.
> It's probably good to include a link to the patch in any case.
>
>
Thanks for this reminder, I didn't pay enough attention to this. Fixed.

(The original patch looks like a diff file not a commit, I wrote a simple
commit
message for this and link to the origin discussion link.)

> > Subject: [PATCH v1 4/6] remove duplicated qual executing.
>
> ---
>
>
> src/backend/optimizer/path/equivclass.c | 22 +++++++++++++++++++
>
>
> src/backend/optimizer/plan/createplan.c | 29 +++++++++++++++++++++++--
>
>
> src/include/optimizer/paths.h | 2 ++
>
>
> src/test/regress/parallel_schedule | 2 ++
>
>
> 4 files changed, 53 insertions(+), 2 deletions(-)
>
>
>
> I think the ./ec_filter test is missing from from this patch.
>
>
Indeed..

> > Subject: [PATCH v1 6/6] adding some test cases for this feature and fix
> the existing case
>
>
> The tests should be updated with the corresponding patches. It's common
> for
> the patches to be commited separately, like if 0001 is ready but the
> others are
> still evolving.
>

Yes, I agree with this. Just that in this case, the commit split is just
for easy
review/discussion. they are unlikely to be able to commit separately. so I
keep
it as it was and improve each commit message.

>
> I'm not sure whether you think this patch is ready to be added to a
> commitfest,
> but do you know about the CI infrastructure ? It allows running all the
> cfbot
> tests for a github branch against 4 OS, which helps catch portability
> issues,
> including memory errors and unstable explain output. See:
> src/tools/ci/README.
>

Added. https://commitfest.postgresql.org/37/3524/

> There's an failure in postgres_fdw, probably the output needs to be
> updated.
>

For the postgres_fdw, I just refreshed the content. with this patch, the
plan changed
from

Foreign Scan
Output: ft5.*, ft5.c1, ft5.c2, ft5.c3, ft4.c1, ft4.c2
Relations: (public.ft5) INNER JOIN (public.ft4)
Remote SQL: SELECT CASE WHEN (r1.*)::text IS NOT NULL THEN ROW(r1.c1,
r1.c2, r1.c3) END, r1.c1, r1.c2, r1.c3, r2.c1, r2.c2 FROM ("S 1"."T 4" r1
INNER JOIN "S 1"."T 3" r2 ON (((r1.c1 = r2.c1)) AND ((r2.c1 >= 10)) AND
((r2.c1 <= 30)))) ORDER BY r1.c1 ASC NULLS LAST
(4 rows)

to

Sort (cost=108.02..108.04 rows=7 width=62)
Output: ft5.*, ft5.c1, ft5.c2, ft5.c3, ft4.c1, ft4.c2
Sort Key: ft5.c1
-> Foreign Scan (cost=100.00..107.92 rows=7 width=62)
Output: ft5.*, ft5.c1, ft5.c2, ft5.c3, ft4.c1, ft4.c2
Relations: (public.ft5) INNER JOIN (public.ft4)
Remote SQL: SELECT CASE WHEN (r1.*)::text IS NOT NULL THEN
ROW(r1.c1, r1.c2, r1.c3) END, r1.c1, r1.c2, r1.c3, r2.c1, r2.c2 FROM ("S
1"."T 4" r1 INNER JOIN "S 1"."T 3" r2 ON (((r1.c1 = r2.c1)) AND ((r2.c1 >=
10)) AND ((r2.c1 <= 30)) AND ((r1.c1 >= 10)) AND ((r1.
c1 <= 30))))

But if I set enable_sort = off, we can still get the previous plan, which
proves that
this patch doesn't make the above path unavailable, it is just not cheaper
than
the new one. Here is the new commit messages:

commit e0a7838a09e73f831eecb23b5e7884cc34d71301
Author: David Rowley <dgrowleyml(at)gmail(dot)com>
Date: Tue Feb 1 20:56:40 2022 +0800

Introudce ec_filters in EquivalenceClass struct, the semantics is the
quals can

be applied to any EquivalenceMember in this EC. Later this information
is used
to generate new RestrictInfo and was distributed to related RelOptInfo
very
soon.

Author: David Rowley at 2015-12 [1]
Andy Fan rebase this patch to current latest code.

https://www.postgresql.org/message-id/CAKJS1f9FK_X_5HKcPcSeimy16Owe3EmPmmGsGWLcKkj_rW9s6A%40mail.gmail.com

commit 73f52d0909374446cd689457f0a4ef52addb035e
Author: Andy Fan <yizhi(dot)fzh(at)alibaba-inc(dot)com>
Date: Tue Feb 1 14:54:07 2022 +0800

After distributing the new derived RestrictInfo into RelOptInfo, then
the rows
estimation is wrong at the joinrel part. The reason is well described
at [1] and
[2], To fix this issue, I added a new field "EquivalenceClass
*derived" in
RestrictInfo struct to indicate how this qual is generated. we would
ignore such
qual during estimate of the rows size. All the set_xx_size should be
taken care of, but
for now, just set_plain_rel_size is taken care of for the PoC purpose.

[1]

https://www.postgresql.org/message-id/flat/CAKJS1f9FK_X_5HKcPcSeimy16Owe3EmPmmGsGWLcKkj_rW9s6A%40mail.gmail.com
[2]

https://www.postgresql.org/message-id/flat/1727507.1620948117%40sss.pgh.pa.us#52ac3f46cf614acb0bdbddb7128f5bd2

commit 8439b4818410d860a4ca4be3458b54c04c6f8648
Author: Andy Fan <yizhi(dot)fzh(at)alibaba-inc(dot)com>
Date: Tue Feb 1 15:20:10 2022 +0800

Introduce RelOptInfo.filtered_rows.

Previously the Path.rows (shown in the explain output) and
RelOptInfo.rows
which would be used to calculating joinrel's estimated rows are same
at many scan paths, like SeqScan, IndexScan, BitmapHeapScan and so on.
But
they would be different after distributing a new restrictinfo from
ec_filter.
So I developed RelOptInfo.filtered_rows to take some duty out of
RelOptInfo.rows.

commit 11b3395bb5bcc4a2bcff6fed8078dbbf3cda81b1
Author: Andy Fan <yizhi(dot)fzh(at)alibaba-inc(dot)com>
Date: Tue Feb 1 17:37:27 2022 +0800

Remove duplicated qual executing for executor.

Take the SELECT * FROM t1, t2 WHERE t1.a = t2.a and t2.a > 3 for
example,
we can derive t1.a > 3 with EC filter infrastructure. However if it
generate a
plan like below, the new generated qual does not deserve to execute.

Nest Loop
Seq Scan (t1.a > 3)
Index Scan t2_a
(a = t1.a) (t2.a > 3)

This patch removes the "t2.a > 3" for the above case.

commit 2875a76136293589b6e409cb6be4defab87ade59
Author: Andy Fan <yizhi(dot)fzh(at)alibaba-inc(dot)com>
Date: Wed Feb 2 11:54:24 2022 +0800

Support ScalarArrayOpExpr and perudoconstant on ef_filter.

commit a4b21ab6fd0fd57902f5471ec962a77b59085158 (HEAD -> cf_v4)
Author: Andy Fan <yizhi(dot)fzh(at)alibaba-inc(dot)com>
Date: Wed Feb 2 11:59:53 2022 +0800

Added the testcase for this feature and fix the previous test case

as well. The new added test case needs outputting some runtime
statistics, which will probably be different at each run. I can think
of a way to make the test case stable if the patchsets are not wrong
at the first step.

--
Best Regards
Andy Fan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kyotaro Horiguchi 2022-02-02 07:49:57 Re: RFC: Logging plan of the running query
Previous Message David G. Johnston 2022-02-02 07:36:08 Re: Design of pg_stat_subscription_workers vs pgstats