From: | Maxim Boguk <maxim(dot)boguk(at)gmail(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | pgsql-bugs(at)postgresql(dot)org |
Subject: | Re: BUG #5885: Strange rows estimation for left join |
Date: | 2011-02-16 09:24:16 |
Message-ID: | AANLkTi=QZyX4K6Bk1_8u=Kn_Ms13i-w61x6oLAgJJej9@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs |
On Wed, Feb 16, 2011 at 6:18 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Maxim Boguk <maxim(dot)boguk(at)gmail(dot)com> writes:
> > Test case look like:
>
> > create table "references" ( attr_id integer, reference integer,
> > object_id integer );
> > insert into "references" select *100**(random()),
> > *100000**(random()^*10*), *1000000**(random()) from
> > generate_series(*1*,*10000000*);
> > create index xif01references on "references" ( reference, attr_id );
> > create index xif02references on "references" ( object_id, attr_id,
> reference );
>
> > analyze "references";
>
> > explain select * from "references" rs left join "references" vm on
> > vm.reference = rs.reference and vm.attr_id = *10* where rs.object_id =
> > *1000*;
>
> I don't believe there's actually anything very wrong here. The
> large-looking estimate for the join size is not out of line: if you try
> different values for object_id you will find that some produce more rows
> than that and some produce less. If we had cross-column stats we could
> maybe derive a better estimate, but as-is you're getting an estimate
> that is probably about right on the average, depending on whether the
> particular object_id matches to more common or less common reference
> values.
>
> The thing that looks funny is that the inner indexscan rowcount estimate
> is so small, which is because that's being done on the assumption that
> the passed-in rs.reference value is random. It's not really --- it's
> more likely to be one of the more common reference values --- which is
> something that's correctly accounted for in the join size estimate but
> not in the inner indexscan estimate.
>
> regards, tom lane
>
Thank you very much for answer.
Are I correct in my assumption:
estimated row counts in both sides of the join isn't related to estimated
resulting row count of the join because they are calculated independently?
If that assumption correct than which values used to select between nested
loop and merge/hash joins (estimated resulting join rows or estimated row
counts on each sides of the join)?
I asking because in some cases these two values can lead to different plans.
PS: I just calculated how many questions I had in mail lists about
postgresql planner. Look like it's time to me dust off my memory about C and
start read planner code by myself. :)
Kind Regards, Maxim
From | Date | Subject | |
---|---|---|---|
Next Message | Konrad Garus | 2011-02-16 10:58:28 | BUG #5889: "Intersects" for polygons broken |
Previous Message | Warstone | 2011-02-15 20:59:58 | Re[2]: BUG #5887: EXPLAIN returns more than 1000 rows when table is empty |