Quick Links

Re: Suggestions wanted for 7.2.4 query

From:	Josh Berkus <josh(at)agliodbs(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	PgSQL Performance ML <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: Suggestions wanted for 7.2.4 query
Date:	2003-05-04 17:59:41
Message-ID:	200305041059.41468.josh@agliodbs.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-performance

Tom,

> I have to take that back (must have been out too late last night ;-)).
> The EXISTS subquery *is* getting pushed down to become a restriction on
> events alone; that's what the "SubPlan" is. However, it'd still be
> worth looking for another way to express it, because the planner is
> pretty clueless about the selectivity of EXISTS restrictions. That's
> what's causing it to drastically overestimate the number of rows taken
> from "events" (14812 vs 1919), which in turn drives it away from using
> the nestloop-with-inner-indexscan join style for joining to "cases".

That may be solvable without forcing a seq scan on "events", simply by
overdetermining the criteria on date. That is, I can't apply the date
criteria to "events" because that would require running date calucations on
each row forcing a seq scan ( i.e. (event_date + duration) between date_one
and date_two would require a seq scan), but I can apply a broadend version of
the criteria to "events" ( i.e. event_date between (date_one - 1 month) and
(date_two + 1 day)) which would give the planner the idea that it is
returning a minority of rows from "events".

Someday, we have to come up with a way of indexing simple multi-column
calculations. Unless someone did that in current source while I was behind
on -hackers?

> Right. The nestloop/indexscan style only wins if there are not too many
> outer rows. If the EXISTS constraint actually did succeed for 14812
> "events" rows, the planner would probably be making the right choice to
> use a hash join.

Hmm. Any hope of improving this in the future? Like the IN() functionality
improvements in 7.4?

> BTW, have you tried lowering the value of "random_page_cost"? Looking
> at the relative costs in these examples makes me think most of your
> tables are cached in memory. Of course, if that's not true during
> day-to-day production then you need to be wary about reducing the setting.

No, we're probably cached ... the machine has 1gb of RAM. Also it has a
really fast RAID array, at least for block disk reads, although random seek
times suck. I can tweak a little. The problem is that it's a production
machine in use 70 hours a week, so there isn't a lot of time we can test
performance settings that might cause problems.

Thanks for the advice!

--
Josh Berkus
Aglio Database Solutions
San Francisco

In response to

Re: Suggestions wanted for 7.2.4 query at 2003-05-04 17:23:09 from Tom Lane

Responses

Re: Suggestions wanted for 7.2.4 query at 2003-05-05 17:59:11 from Josh Berkus

Browse pgsql-performance by date

	From	Date	Subject
Next Message	Reiner Dassing	2003-05-05 12:16:27	Indices are not used by the optimizer
Previous Message	Tom Lane	2003-05-04 17:23:09	Re: Suggestions wanted for 7.2.4 query