From: | Greg Stark <gsstark(at)mit(dot)edu> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Greg Stark <gsstark(at)mit(dot)edu>, pgsql-general(at)postgresql(dot)org |
Subject: | Re: optimizer bent on full table scan |
Date: | 2003-02-22 17:41:46 |
Message-ID: | 87smugxlet.fsf@stark.dyndns.tv |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:
> Greg Stark <gsstark(at)mit(dot)edu> writes:
> > I have a strange problem with the optimizer from CVS checked out as of about a
> > week ago. It seems to be insisting on using a full table scan for a table
> > lookup where the full table scan is about 20x slower than an index lookup.
>
> > I think somehow it's being confused by the fact that some values of the index
> > are populated and others haven't been so the distribution is odd.
>
> Well, it's doing a heck of a good job of estimating the number of
> matching rows --- can't complain about 8757 vs 8721. So there's some
> other failure of modeling here. The only idea that comes to mind is
> that maybe the rows matching foobar_id = 900 are tightly clustered in
> the table, so that the planner's assumption of random fetches is overly
> pessimistic. But the small correlation value says that there's not much
> overall structure in the table's ordering. Can you shed any light on
> that?
Hm, that's hard to say. The table was originally populated by a job that loops
through a sequential scan of the parent table around and inserts all the
records for a given foobar_id in a single insert. So presumably those records
would end up together though the foobar_id's might not be in sequential order
which might skew the correlation.
I guess correlation works well for range scans but isn't really a good
substitute for measuring the "clustering" that's relevant for scans on =.
But even so I'm surprised it's even close. The selectivity is about 2% and
I've even lowered random_page_cost from the default, so it seems like even
with very scattered records it would still be worthwhile using an index.
slo=> select count(*) from foobartab;
count
--------
406849
(1 row)
Time: 31650.87 ms
slo=> select count(*) from foobartab where foobar_id = 900;
count
-------
8721
(1 row)
Time: 5213.82 ms
--
greg
From | Date | Subject | |
---|---|---|---|
Next Message | Neil Conway | 2003-02-22 18:11:37 | Re: What filesystem? |
Previous Message | Carlos Moreno | 2003-02-22 17:34:53 | Strange error (Socket command option unknown) |