Re: Planner, check if can use consider HASH for groupings (src/backend/optimizer/plan/planner.c)

From: Ranier Vilela <ranier(dot)vf(at)gmail(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Planner, check if can use consider HASH for groupings (src/backend/optimizer/plan/planner.c)
Date: 2020-09-20 17:36:00
Message-ID: CAEudQAoSTe=FOdrC9mnjhMRQHscXvqqtcnRSAoQEcQNYMGmKRg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Em seg., 21 de set. de 2020 às 14:24, Tomas Vondra <
tomas(dot)vondra(at)2ndquadrant(dot)com> escreveu:

> On Sun, Sep 20, 2020 at 01:50:34PM -0300, Ranier Vilela wrote:
> >Em seg., 21 de set. de 2020 às 13:37, Tomas Vondra <
> >tomas(dot)vondra(at)2ndquadrant(dot)com> escreveu:
> >
> >> On Mon, Sep 21, 2020 at 08:32:35AM -0300, Ranier Vilela wrote:
> >> >Em dom., 20 de set. de 2020 às 21:10, Tomas Vondra <
> >> >tomas(dot)vondra(at)2ndquadrant(dot)com> escreveu:
> >> >
> >> >> On Sun, Sep 20, 2020 at 08:09:56PM -0300, Ranier Vilela wrote:
> >> >> >Em sex., 18 de set. de 2020 às 10:37, Tomas Vondra <
> >> >> >tomas(dot)vondra(at)2ndquadrant(dot)com> escreveu:
> >> >> >
> >> >> >> On Thu, Sep 17, 2020 at 06:31:12PM -0300, Ranier Vilela wrote:
> >> >> >> >Em qui., 17 de set. de 2020 às 17:09, Tomas Vondra <
> >> >> >> >tomas(dot)vondra(at)2ndquadrant(dot)com> escreveu:
> >> >> >> >
> >> >> >> >> On Thu, Sep 17, 2020 at 02:12:12PM -0300, Ranier Vilela wrote:
> >> >> >> >> >Hi,
> >> >> >> >> >
> >> >> >> >> >In case gd->any_hashable is FALSE, grouping_is_hashable is
> never
> >> >> >> called.
> >> >> >> >> >In this case, the planner could use HASH for groupings, but
> will
> >> >> never
> >> >> >> >> know.
> >> >> >> >> >
> >> >> >> >>
> >> >> >> >> The condition is pretty simple - if the query has grouping
> sets,
> >> >> look at
> >> >> >> >> grouping sets, otherwise look at groupClause. For this to be an
> >> issue
> >> >> >> >> the query would need to have both grouping sets and
> (independent)
> >> >> group
> >> >> >> >> clause - which AFAIK is not possible.
> >> >> >> >>
> >> >> >> >Uh?
> >> >> >> >(parse->groupClause != NIL) If different from NIL we have
> >> >> ((independent)
> >> >> >> >group clause), grouping_is_hashable should check?
> >> >> >> >(gd ? gd->any_hashable :
> grouping_is_hashable(parse->groupClause))))
> >> >> >> >If gd is not NIL and gd->any_hashtable is FALSE?
> >> >> >> >
> >> >> >>
> >> >> >> Sorry, I'm not quite sure I understand what this is meant to say
> :-(
> >> >> >>
> >> >> >> Anyway, (groupClause != NIL) does not mean the groupClause is
> somehow
> >> >> >> independent (of what?). Add some debugging to
> create_grouping_paths
> >> and
> >> >> >> you'll see that e.g. this query ends up with groupClause with 3
> >> items:
> >> >> >>
> >> >> >> select 1 from t group by grouping sets ((a), (b), (c));
> >> >> >>
> >> >> >> and so does this one:
> >> >> >>
> >> >> >> select 1 from t group by grouping sets ((a,c), (a,b));
> >> >> >>
> >> >> >> I'm no expert in grouping sets, but I'd bet this means we
> transform
> >> the
> >> >> >> grouping sets into a groupClause by extracting the keys. I haven't
> >> >> >> investigated why exactly we do this, but I'm sure there's a reason
> >> (e.g.
> >> >> >> it gives us SortGroupClause).
> >> >> >>
> >> >> >> You seem to believe a query can have both grouping sets and
> regular
> >> >> >> grouping at the same level - but how would such query look like?
> >> Surely
> >> >> >> you can't have two GROuP BY clauses. You can do
> >> >> >>
> >> >> >Translating into code:
> >> >> >gd is grouping sets and
> >> >> >parse->groupClause is regular grouping
> >> >> >and we cannot have both at the same time.
> >> >> >
> >> >>
> >> >> Have you verified the claim that we can't have both at the same
> time? As
> >> >> I already explained, we build groupClause from grouping sets. I
> suggest
> >> >> you do some debugging using the queries I used as examples, and
> you'll
> >> >> see the claim is not true.
> >> >>
> >> >I think we already agreed that we cannot have both at the same time.
> >> >
> >>
> >> No, we haven't. I think we've agreed that you can't have both a group by
> >> clause with grouping sets and then another group by clause with "plain"
> >> grouping. But even with grouping sets you'll have groupClause generated
> >> from the grouping sets. See preprocess_grouping_sets.
> >>
> >> >
> >> >>
> >> >> >
> >> >> >> select 1 from t group by a, grouping sets ((b), (c));
> >> >> >>
> >> >> >> which is however mostly equivalent to (AFAICS)
> >> >> >>
> >> >> >> select 1 from t group by grouping sets ((a,b), (a,c))
> >> >> >>
> >> >> >> so it's not like we have an independent groupClause either I
> think.
> >> >> >>
> >> >> >> The whole point of the original condition is - if there are
> grouping
> >> >> >> sets, check if at least one can be executed using hashing (i.e.
> all
> >> keys
> >> >> >> are hashable). Otherwise (without grouping sets) look at the
> >> grouping as
> >> >> >> a whole.
> >> >> >>
> >> >> >Or if gd is NULL check parse->groupClause.
> >> >> >
> >> >>
> >> >> Which is what I'm saying.
> >> >>
> >> >> >
> >> >> >> I don't see how your change improves this - if there are grouping
> >> sets,
> >> >> >> it's futile to look at the whole groupClause if at least one
> grouping
> >> >> >> set can't be hashed.
> >> >> >>
> >> >> >> But there's a simple way to disprove this - show us a case (table
> >> and a
> >> >> >> query) where your code correctly allows hashing while the current
> one
> >> >> >> does not.
> >> >> >
> >> >> >I'm not an expert in grouping either.
> >> >> >The question I have here is whether gd is populated and has gd->
> >> >> >any_hashable as FALSE,
> >> >> >Its mean no use checking parse-> groupClause, it's a waste of time,
> Ok.
> >> >> >
> >> >>
> >> >> I'm sorry, I don't follow your logic. Those are two separate cases.
> If
> >> >> we have grouping sets, we have to check if at least one can be
> hashed.
> >> >> If we don't have grouping sets, we have to check groupClause
> directly.
> >> >> Why would that be a waste of time is unclear to me.
> >> >>
> >> >This is clear to me.
> >> >The problem is how it was implemented in create_grouping_paths.
> >> >
> >>
> >> You haven't demonstrated what the problem is, though. Showing us a query
> >> that fails would make it very clear.
> >>
> >> >
> >> >>
> >> >> >
> >> >> >>
> >> >> >> >
> >> >> >> >> For hashing to be worth considering, at least one grouping set
> has
> >> >> to be
> >> >> >> >> hashable - otherwise it's pointless.
> >> >> >> >>
> >> >> >> >> Granted, you could have something like
> >> >> >> >>
> >> >> >> >> GROUP BY GROUPING SETS ((a), (b)), c
> >> >> >> >>
> >> >> >> >> which I think essentially says "add c to every grouping set"
> and
> >> that
> >> >> >> >> will be covered by the any_hashable check.
> >> >> >> >>
> >> >> >> >I am not going into the merit of whether or not it is worth
> hashing.
> >> >> >> >grouping_is_hashable as a last resort, must decide.
> >> >> >> >
> >> >> >>
> >> >> >> I don't know what this is supposed to say either. The whole point
> of
> >> >> >> this check is to simply skip construction of hash-based paths in
> >> cases
> >> >> >> when it's obviously futile (i.e. when some of the keys don't
> support
> >> >> >> hashing). We do this as early as possible, because the whole point
> >> is to
> >> >> >> save precious CPU cycles during query planning.
> >> >> >>
> >> >> >
> >> >> >> >
> >> >> >> >> >Apparently gd pointer, will never be NULL there, verified with
> >> >> >> Assert(gd
> >> >> >> >> !=
> >> >> >> >> >NULL).
> >> >> >> >> >
> >> >> >> >>
> >> >> >> >> Um, what? If you add the assert right before the if condition,
> you
> >> >> won't
> >> >> >> >> even be able to do initdb. It's pretty clear it'll crash for
> any
> >> >> query
> >> >> >> >> without grouping sets.
> >> >> >> >>
> >> >> >> >Here not:
> >> >> >> >Assert(gd != NULL);
> >> >> >> >create_ordinary_grouping_paths(root, input_rel, grouped_rel,
> >> >> >> > agg_costs, gd, &extra,
> >> >> >> > &partially_grouped_rel);
> >> >> >> >
> >> >> >>
> >> >> >> I have no idea where you're adding this assert. But simply adding
> it
> >> to
> >> >> >> create_grouping_paths (right before the if condition changed by
> your
> >> >> >> patch) will trigger a failure during initdb. Simply because for
> >> queries
> >> >> >> without grouping sets (and with regular grouping) we pass gs=NULL.
> >> >> >>
> >> >> >> Try applying the attached patch and do "pg_ctl -D ... init" -
> you'll
> >> get
> >> >> >> a failure proving that gd=NULL.
> >> >> >>
> >> >> >Well, I did a test right now, downloaded the latest HEAD and added
> the
> >> >> >Assertive.
> >> >> >I compiled everything, ran the regression tests, installed the
> >> binaries,
> >> >> >loaded the server and installed a client database.
> >> >> >Everything is working.
> >> >> >Maybe in all these steps, there is no grouping that would trigger
> the
> >> code
> >> >> >in question, but I doubt it.
> >> >> >
> >> >>
> >> >> For me doing this leads to reliable crashes when pg_regress does
> initdb
> >> >> (so before the actual checks):
> >> >>
> >> >> patch -p1 < ~/grouping-assert.patch
> >> >> ./configure --enable-debug --enable-cassert
> >> >> make -s clean && make -s -j4 && make check
> >> >>
> >> >> And the "make check" it immediately crashes like this:
> >> >>
> >> >> ============== creating temporary instance
> ==============
> >> >> ============== initializing database system
> ==============
> >> >>
> >> >> pg_regress: initdb failed
> >> >> Examine /home/user/work/postgres/src/test/regress/log/initdb.log
> for
> >> >> the reason.
> >> >>
> >> >> on the assert. So yeah, I'd guess there's something wrong with your
> >> >> build. What does pg_config say?
> >> >>
> >> >Default.
> >> >I never change anything. I simply clone the last head:
> >> >git clone --single-branch https://github.com/postgres/postgres/
> >> >and have it compiled.
> >> >
> >>
> >> Compiled how?
> >>
> >build Debug
> >
>
> What does that mean? Wouldn't it be simpler to just show pg_config
> output, which shows the exact flags used for configure / compile?
>
I'm sorry.

BINDIR = C:/dll/postgres/Debug/PG_CON~1
DOCDIR = /doc
HTMLDIR = /doc
INCLUDEDIR = /include
PKGINCLUDEDIR = /include
INCLUDEDIR-SERVER = /include/server
LIBDIR = /lib
PKGLIBDIR = /lib
LOCALEDIR = /share/locale
MANDIR = /man
SHAREDIR = /share
SYSCONFDIR = /etc
PGXS = /lib/pgxs/src/makefiles/pgxs.mk
CONFIGURE = --enable-thread-safety --with-ldap --without-zlib
CC = not recorded
CPPFLAGS = not recorded
CFLAGS = not recorded
CFLAGS_SL = not recorded
LDFLAGS = not recorded
LDFLAGS_EX = not recorded
LDFLAGS_SL = not recorded
LIBS = not recorded
VERSION = PostgreSQL 14devel

msvc 2019 (64 bits)
windows 10 (64bits)
steps:
cd\dll\postgres\src\tools\msvc
build Debug

Maybe, in Windows builds, has another file?

regards,
Ranier Vilela

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2020-09-20 20:41:12 Re: Yet another fast GiST build
Previous Message Tom Lane 2020-09-20 17:13:16 Re: recovering from "found xmin ... from before relfrozenxid ..."