From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | David Kubečka <kubecka(dot)dav(at)gmail(dot)com> |
Cc: | Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Weirdly pesimistic estimates in optimizer |
Date: | 2015-03-06 16:58:42 |
Message-ID: | 21909.1425661122@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
I wrote:
> I chewed on this for awhile and decided that there'd be no real harm in
> taking identification of the unique expressions out of
> create_unique_path() and doing it earlier, in initsplan.c; we'd need a
> couple more fields in SpecialJoinInfo but that doesn't seem like a
> problem. However, rel->rows is a *big* problem; we simply have not made
> any join size estimates yet, and can't, because these things are done
> bottom up.
> However ... estimate_num_groups's dependency on its rowcount input is not
> large (it's basically using it as a clamp). So conceivably we could have
> get_loop_count just multiply together the sizes of the base relations
> included in the semijoin's RHS to get a preliminary estimate of that
> number. This would be the right thing anyway for a single relation in the
> RHS, which is the most common case. It would usually be an overestimate
> for join RHS, but we could hope that the output of estimate_num_groups
> wouldn't be affected too badly.
Attached is a draft patch that does those two things. I like the first
part (replacing SpecialJoinInfo's rather ad-hoc join_quals field with
something more explicitly attuned to semijoin uniqueness processing).
The second part is still pretty much of a kluge, but then get_loop_count
was a kluge already. This arguably makes it better.
Now, on the test case you presented, this has the unfortunate effect that
it now reliably chooses the "wrong" plan for both cases :-(. But I think
that's a reflection of poor cost parameters (ie, test case fits handily in
RAM but we've not set the cost parameters to reflect that). We do get the
same rowcount and roughly-same cost estimates for both the random_fk_dupl
and random_fk_uniq queries, so from that standpoint it's doing the right
thing. If I reduce random_page_cost to 2 or so, it makes the choices you
wanted.
regards, tom lane
Attachment | Content-Type | Size |
---|---|---|
adjust-loop-count-for-semijoins.patch | text/x-diff | 39.4 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Bruce Momjian | 2015-03-06 17:19:36 | Re: pg_upgrade and rsync |
Previous Message | Tom Lane | 2015-03-06 16:48:40 | Re: Clamping reulst row number of joins. |