From: | Mark Stosberg <mark(at)summersault(dot)com> |
---|---|
To: | pgsql-performance(at)postgresql(dot)org |
Subject: | reindex vs 'analyze' (was: Re: cube operations slower than geo_distance() on production server) |
Date: | 2007-02-14 16:28:38 |
Message-ID: | eqvdbm$1c8p$1@news.hub.org |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
Merlin Moncure wrote:
> On 2/14/07, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> There are two things wrong here: first, that the estimated row
>> count is only 20% of actual; it should certainly not be that far
>> off for such a simple condition. I wonder if your vacuum/analyze
>> procedures are actually working. Second, you mentioned somewhere
>> along the line that 'available' pets are about 10% of all the
>> entries, which means that this indexscan is more than likely
>> entirely counterproductive: it would be cheaper to ignore this
>> index altogether.
Tom,
Thanks for the generosity of your time. We are using 8.1.3 currently. I
have read there are some performance improvements in 8.2, but we have
not started evaluating that yet.
Your suggestion about the pet_state index was right on. I tried
"Analyze" on it, but still got the same bad estimate. However, I then
used "reindex" on that index, and that fixed the estimate accuracy,
which made the query run faster! The cube search now benchmarks faster
than the old search in production, taking about 2/3s of the time of the
old one.
Any ideas why the manual REINDEX did something that "analyze" didn't? It
makes me wonder if there is other tuning like this to do.
Attached is the EA output from the most recent run, after the "re-index".
> I think switching the index on pet_state to a composite on (pet_state,
> species_id) might help too.
>
> or even better:
>
> create function is_pet_available(text) returns bool as
> $$
> select $1='available';
> $$ language sql immutable;
>
> create index pets_available_species_idx on
> pets(is_pet_available(pet_state), species_id);
Merlin,
Thanks for this suggestion. It is not an approach I had used before, and
I was interested to try it. However, the new index didn't get chosen.
(Perhaps I would need to drop the old one?) However, Tom's suggestions
did help. I'll follow up on that in just a moment.
>
> refactor your query something similar to:
>
> SELECT * FROM
> (
> SELECT
> earth_coords(q.earth_coords, s.earth_coords)/1609.344 as radius
> FROM pets
> JOIN shelters_active as shelters USING (shelter_id)
> JOIN zipcodes s ON shelters.postal_code_for_joining = zipcodes.zipcode
> JOIN zipcodes q ON q.zipcode = '90210'
> WHERE
> is_pet_available(pet_state)
> AND species_id = 1
> AND earth_box(q.earth_coords, 10*1609.344) @ s.earth_coords
> ) p order by radius
>
> merlin
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: explain analyze is your friend
Attachment | Content-Type | Size |
---|---|---|
ea.txt | text/plain | 2.5 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Kenji Morishige | 2007-02-14 17:25:01 | Re: quad or dual core Intel CPUs |
Previous Message | Merlin Moncure | 2007-02-14 16:20:53 | Re: Benchmarking PGSQL? |