From: | "Shane Wright" <shane(dot)wright(at)edigitalresearch(dot)com> |
---|---|
To: | "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, <pgsql-general(at)postgresql(dot)org> |
Subject: | Re: Using the wrong index (very suboptimal), why? |
Date: | 2008-12-23 14:17:40 |
Message-ID: | 952015000A644E44B2FB8759E151093B01328273@exchange11.ad.edigitalresearch.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Hi Tom,
>> - increasing the statistics target to the maximum setting with SET
>> STATISTICS 1000 on columns rid, nid and iid1 of answers, then
>> re-vacuuming.
> I hope you meant re-analyzing.
Hehe absolutely yes; I always VACUUM VERBOSE ANALYSE :)
>> Index Scan using index_answers_nidiid1 on answers
>> (cost=0.00..114.14 rows=1 width=60) (actual time=0.206..2398.645
>> rows=21891 loops=1)
>> Index Cond: ((nid = 253869913) AND (iid1 = 535292129))
>> Total runtime: 2424.769 ms
>Well, here's the problem all right: 1 row estimated vs 21891 actual.
>Is there something odd about the joint distribution of these two
>columns?
Good :) - I've been chasing the same thing, albeit not with any luck yet
:(
Shouldn't be anything odd about the data I wouldn't have thought...
There are ~670 million rows. No nulls in nid, ~6% of iid1 are null
(they will always be null or not null for a given nid. I.e. rows with a
given nid value will either all be null or all be not null).
nids are randomly selected, there are only ~27000 distinct values, all
between 100000000 and 999999999.
iid1s are also random in the same range, ~50000 distinct values. All
the rows for a given value of nid will have one of a small set of
possible iid1 values; usually 5-15 distinct values. The frequency at
which each nid may occurs is quite uneven; some will be tens of times,
others will be a couple of hundred thousand. Same applies to
corresponding iid1 values.
(the table stores answers to questions; nid is the question ID, iid1 is
the answer ID [for questions where the user picks from a pre-defined
list] - iid1 is NULL for textual answers. iid1 values are grouped into
sets of options the user can pick from, defined elsewhere. These sets
can be shared across questions/nids but not often)
Does the above sound strange?
There aren't any strange errors from the database, autovacuum is enabled
(but not vacuuming the table often enough). This problem was triggered
this time when I manually vacuumed the table - which cleared a lot of
dead rows (again, I always VACUUM VERBOSE ANALYSE).
A separate installation with a similar data set (actually the same but
older; seems the most comparable) estimated 10 rows returned for the
same query.
From | Date | Subject | |
---|---|---|---|
Next Message | Shane Wright | 2008-12-23 14:28:58 | Re: Using the wrong index (very suboptimal), why? |
Previous Message | Jais Mathew | 2008-12-23 14:03:20 | having two database clusters? |