From: | Richard Neill <rn214(at)richardneill(dot)org> |
---|---|
To: | |
Cc: | "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org> |
Subject: | Re: Why does the query planner use two full indexes, when a dedicated partial index exists? |
Date: | 2012-12-22 17:29:03 |
Message-ID: | 50D5EDDF.6010702@richardneill.org |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
> I've now installed 9.2. As you said, thanks to the change in 9.2 it
> initially prefers the partial index.
>
> BUT, after 1 cycle of inserting 500k rows, then deleting them all,
> then starting to insert again, I find that the planner has reverted
> to the former bad behaviour.
>
>
> Presumably the real work load has this type of turn over happen one row
> at a time, rather than all in one giant mass update transaction, right?
> That makes a big difference in the way space is re-used.
Sorry - I meant a "real" workload here. I replayed a whole day's worth
of real data into the DB, and that's what I meant by a cycle. Everything
was row-at-a-time.
(It currently takes about an hour to do this)
>
> Reindexing only takes a couple of seconds, and restores correctness.
>
>
> Even your slow query is pretty fast. If you can't afford that, can you
> afford to take an exclusive lock for a couple of seconds every few minutes?
Yes, I can. If that's the root cause, I'll do that. But it seems to me
that I've stumbled upon some rather awkward behaviour that I need to
understand fully, and if the index is bloating that badly and that
quickly, then perhaps it's a PG bug (or at least cause for a logfile
warning).
BTW, The index has gone from 16kB to 4.5MB in 6 hours of runtime today.
It still only has 252 matching rows.
> What's going on? Do I need to run reindex in a cron-job? I thought
> that reindex wasn't "normally" needed, and that index bloat happened
> only after every row had changed value hundreds of times.
>
>
> The partial index is highly leveraged. If every tuple in the table is
> updated once, that amounts to every tuple in the index being updated
> 25,000 times.
How so? That sounds like O(n_2) behaviour.
>
> For the same reason, it is probably not getting vacuum often enough.
> The default settings have the table vacuumed once 20% of its rows
> turns over, but that means the partial index has been turned over many
> many times. You could crank down the auto-vacuum settings for that
> table, or run manual vacuum with a cron job.
>
> Vacuum will not unbloat the index, but if you run it often enough it
> will keep the bloat from getting too bad in the first place.
Thanks. I've reduced autovacuum_vacuum_scale_factor from 0.2 to 0.05
(and set autovacuum_analyze_scale_factor = 0.05 for good measure)
As I understand it, both of these can run in parallel, and I have 7
cores usually idle, while the other is maxed out.
> But what I think I'd do is change one of your full indexes to contain
> the other column as well, and get rid of the partial index. It might
> not be quite as efficient as the partial index might theoretically be,
> but it should be pretty good and also be less fragile.
I'll try that.
Thanks,
Richard
From | Date | Subject | |
---|---|---|---|
Next Message | Richard Neill | 2012-12-22 17:46:30 | Re: Why does the query planner use two full indexes, when a dedicated partial index exists? |
Previous Message | Kevin Grittner | 2012-12-21 15:34:13 | Re: Slow queries after vacuum analyze |