Re:

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Scott Morrison" <smorrison(at)navtechinc(dot)com>
Cc: greg(at)turnstep(dot)com, pgsql-novice(at)postgresql(dot)org
Subject: Re:
Date: 2003-02-11 00:41:51
Message-ID: 18267.1044924111@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-novice

"Scott Morrison" <smorrison(at)navtechinc(dot)com> writes:
> Query 1 (Original):
> ANALYZE sample;
> CREATE INDEX sample_id ON sample(id);
> CREATE INDEX sample_date ON sample(date);
> ANALYZE sample;
> EXPLAIN ANALYZE
> SELECT * FROM sample a WHERE (id,date) IN
> (SELECT id,max(date) FROM sample
> WHERE id=a.id AND date<='2003-02-07'
> GROUP BY id);

It's got to be possible to improve on that. The big problem is the IN,
which is a very inefficient construct (and the upcoming 7.4 improvements
won't help any, because they only address cases where the subselect is
uncorrelated, ie, doesn't use variables from the outer query). But it's
easy to get rid of the IN. First off, because the sub-select contains
WHERE id = a.id, there is no need to return id from the sub-SELECT; it
must be equal to the outer a.id, no?

SELECT * FROM sample a WHERE date IN
(SELECT max(date) FROM sample
WHERE id=a.id AND date<='2003-02-07'
GROUP BY id);

For the same reason, we do not need GROUP BY in the sub-select, because
only one group can be formed. This gives us

SELECT * FROM sample a WHERE date IN
(SELECT max(date) FROM sample
WHERE id=a.id AND date<='2003-02-07');

and now we can see that the sub-select returns exactly one row, which
means we can reduce IN to "="

SELECT * FROM sample a WHERE date =
(SELECT max(date) FROM sample
WHERE id=a.id AND date<='2003-02-07');

and now the sub-select is amenable to the LIMIT-and-index trick for
implementing MAX() quickly:

CREATE INDEX fooi on sample (id, date);

SELECT * FROM sample a WHERE date =
(SELECT date FROM sample
WHERE id=a.id AND date<='2003-02-07'
ORDER BY id DESC, date DESC LIMIT 1);

Now this is a pretty decent subplan --- if you try it you should see
a plan like

Seq Scan on sample a (cost=0.00..4330.81 rows=5 width=8)
Filter: (date = (subplan))
SubPlan
-> Limit (cost=0.00..4.31 rows=1 width=8)
-> Index Scan Backward using fooi on sample (cost=0.00..7.18 rows=2 width=8)
Index Cond: ((id = $0) AND (date <= '2003-02-07'::date))

and there isn't any way to make it quicker: the index is taking us
directly to the single value that we want. But the overall performance
will still suck, because we are invoking the subplan once for every row
of the table. And here's where you need to consider rearranging your
schema. The only reason we need to scan all of the 'sample' table is that
we are using the table rows as the source of potential values of 'id' to
plug into the subquery. Are you willing to make a side table showing
the allowed values of 'id'? (You could enforce that it's correct with a
foreign key constraint on 'sample'.) If you are, then this would work:

SELECT a.* FROM sample a, possible_ids p WHERE p.id = a.id
AND date =
(SELECT date FROM sample
WHERE id = p.id AND date <= '2003-02-07'
ORDER BY id DESC, date DESC LIMIT 1);

Experimenting, I get a plan like

Nested Loop (cost=0.00..13477.14 rows=25 width=12)
-> Seq Scan on possible_ids p (cost=0.00..20.00 rows=1000 width=4)
-> Index Scan using fooi on sample a (cost=0.00..9.13 rows=1 width=8)
Index Cond: (("outer".id = a.id) AND (a.date = (subplan)))
SubPlan
-> Limit (cost=0.00..4.31 rows=1 width=8)
-> Index Scan Backward using fooi on sample (cost=0.00..7.18 rows=2 width=8)
Index Cond: ((id = $0) AND (date <= '2003-02-07'::date))
-> Limit (cost=0.00..4.31 rows=1 width=8)
-> Index Scan Backward using fooi on sample (cost=0.00..7.18 rows=2 width=8)
Index Cond: ((id = $0) AND (date <= '2003-02-07'::date))

which looks pretty promising. This should require just one subplan
execution (one index probe) followed by an index probe into the 'sample'
table, for each possible 'id' value. Assuming you have many fewer
distinct 'id' values than you do sample rows, this should win big.

(Note: the reason we see the subplan listed twice is an arcane
implementation detail; in practice only one of the copies will get
executed in this query.)

regards, tom lane

In response to

  • Re: at 2003-02-10 22:46:43 from Scott Morrison

Browse pgsql-novice by date

  From Date Subject
Next Message greg 2003-02-11 02:54:53 Re: Characters To be Escaped in Perl
Previous Message Peter Hunčár 2003-02-11 00:32:31 how to instal libpgtcl ang pgaccess Tcl stuff