From: | David Rowley <david(dot)rowley(at)2ndquadrant(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> |
Subject: | Re: Performance issue in foreign-key-aware join estimation |
Date: | 2019-03-18 01:06:19 |
Message-ID: | CAKJS1f8YLCxx2=XS+VP0niWN_oYw_62KjL0eSL5iOkqnh-Ww2A@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Thanks for looking at this.
On Mon, 18 Mar 2019 at 10:08, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> I looked at this for a little bit. I'm on board with the basic idea
> of assigning integer indexes to the ECs and keeping bitmapsets of
> relevant EC indexes in RelOptInfos. However ... man, is that
> delete_eclass() thing ugly. And slow, and fragile-feeling.
Yeah. When I discovered we remove eclasses from the list I was a
little annoyed as the code was pretty simple before having to account
for that. I'll grant you ugly and slow, but I don't quite see the
fragile part.
> I think that maybe we could improve that situation by not trying to
> maintain the RelOptInfo.eclass_indexes sets at all during the initial
> construction of the ECs, but only setting them up after we've completed
> doing that. We already have an assumption that we know when EC merging
> is done and it's safe to make pathkeys that reference the "canonical"
> (merged) ECs, so it seems like that would be a point where we could
> build the indexing bitmapsets safely. This hinges on not needing the
> bitmapsets till after that point, but it'd be easy to arrange some
> Asserts verifying that we don't try to use them before that.
I don't think that's really that easily workable. Consider, for
example, when add_child_rel_equivalences() is called, this is well
after the location I think you have in mind. Probably this function
should be modified to use the indexes, but it must also update the
indexes too.
> If that doesn't work (because we need the index info sooner), maybe we
> could consider never removing ECs from eq_classes, so that their indices
> never change, then teaching most/all of the loops over eq_classes to
> ignore entries with non-null ec_merged. But we probably want the indexing
> bitmapsets to reflect canonical EC numbers, so we'd still have to do some
> updating I fear -- or else be prepared to chase up the ec_merged links
> when using the index bitmaps.
That's probably a better solution. Perhaps we can continue to nullify
the ec_members, then just skip eclasses with a NIL ec_members. I
avoided that in the patch because I was worried about what extension
might be doing, but if you think it's okay, then I can change the
patch.
> Stepping back a little bit, I wonder whether now is the time to rethink
> the overall EC data structure, as foreseen in this old comment:
>
> * Note: constructing merged EquivalenceClasses is a standard UNION-FIND
> * problem, for which there exist better data structures than simple lists.
> * If this code ever proves to be a bottleneck then it could be sped up ---
> * but for now, simple is beautiful.
I've thought for a while now that it wouldn't be too hard to have
equal(), or some version of it return -1, 0, 1 to allow us to binary
search or build a binary search tree of nodes. I imagine it wouldn't
be too hard to create a compact binary search tree inside an array
with indexes to the left and right sub-tree instead of pointers. That
would likely be a bit more cache friendly and allow simple
non-recursive traversals of the entire tree. However, that does not
sound like something this patch should be doing.
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
From | Date | Subject | |
---|---|---|---|
Next Message | Jamison, Kirk | 2019-03-18 01:08:34 | RE: Timeout parameters |
Previous Message | Peter Geoghegan | 2019-03-18 00:59:25 | Re: Making all nbtree entries unique by having heap TIDs participate in comparisons |