Re: Hypothetical suggestions for planner, indexing improvement

From: "Zeugswetter Andreas SB SD" <ZeugswetterA(at)spardat(dot)at>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Josh Berkus" <josh(at)agliodbs(dot)com>
Cc: <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Hypothetical suggestions for planner, indexing improvement
Date: 2003-05-06 15:05:44
Message-ID: 46C15C39FEB2C44BA555E356FBCD6FA4961F99@m0114.s-mxs.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


> > The reason that I mention EXISTS is because that's where the lack of
> > cross-column corellation is most dramatic; the planner seems to estimate a
> > flat 50% for EXISTS clauses regardless of the content.
>
> No "seems to" about that one: see
> src/backend/optimizer/path/clausesel.c
>
> else if (is_subplan(clause))
> {
> /*
> * Just for the moment! FIX ME! - vadim 02/04/98
> */
> s1 = (Selectivity) 0.5;
> }

I think the main issue cannot be the correlation in this case.
In this FK PK case an events row exists for each row in the subselect,
so the correlation must be above 1, how much above is irrelevant for the
selectivity estimate (it is only relevant in the below 1 cases).

The selectivity in this case can be estimated by estimating the number of rows
returned from the subselect where the (in the example missing) PK-FK join condition
is removed (here: select distinct event_id from event_day where event_day BETWEEN
'2003-04-08' AND '2003-05-18'). (selectivity = min (1, e. rows of subselect / e. rows of
main select))

The "event_day BETWEEN '2003-04-08' AND '2003-05-18'" is what really reduces the
result set here, and that is not used.

Andreas

PS: in the example the subselect join clause event_id=events.event_id is missing

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andreas Pflug 2003-05-06 15:17:08 contribute pg_get_viewdef2 et al
Previous Message Dave Page 2003-05-06 14:14:57 Re: 7.4 features list