Re: Why does the query planner use two full indexes, when a dedicated partial index exists?

From: Richard Neill <rn214(at)richardneill(dot)org>
To:
Cc: "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Why does the query planner use two full indexes, when a dedicated partial index exists?
Date: 2012-12-22 17:29:03
Message-ID: 50D5EDDF.6010702@richardneill.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance


> I've now installed 9.2. As you said, thanks to the change in 9.2 it
> initially prefers the partial index.
>
> BUT, after 1 cycle of inserting 500k rows, then deleting them all,
> then starting to insert again, I find that the planner has reverted
> to the former bad behaviour.
>
>
> Presumably the real work load has this type of turn over happen one row
> at a time, rather than all in one giant mass update transaction, right?
> That makes a big difference in the way space is re-used.

Sorry - I meant a "real" workload here. I replayed a whole day's worth
of real data into the DB, and that's what I meant by a cycle. Everything
was row-at-a-time.
(It currently takes about an hour to do this)

>
> Reindexing only takes a couple of seconds, and restores correctness.
>
>
> Even your slow query is pretty fast. If you can't afford that, can you
> afford to take an exclusive lock for a couple of seconds every few minutes?

Yes, I can. If that's the root cause, I'll do that. But it seems to me
that I've stumbled upon some rather awkward behaviour that I need to
understand fully, and if the index is bloating that badly and that
quickly, then perhaps it's a PG bug (or at least cause for a logfile
warning).

BTW, The index has gone from 16kB to 4.5MB in 6 hours of runtime today.
It still only has 252 matching rows.

> What's going on? Do I need to run reindex in a cron-job? I thought
> that reindex wasn't "normally" needed, and that index bloat happened
> only after every row had changed value hundreds of times.
>
>
> The partial index is highly leveraged. If every tuple in the table is
> updated once, that amounts to every tuple in the index being updated
> 25,000 times.

How so? That sounds like O(n_2) behaviour.

>
> For the same reason, it is probably not getting vacuum often enough.
> The default settings have the table vacuumed once 20% of its rows
> turns over, but that means the partial index has been turned over many
> many times. You could crank down the auto-vacuum settings for that
> table, or run manual vacuum with a cron job.
>
> Vacuum will not unbloat the index, but if you run it often enough it
> will keep the bloat from getting too bad in the first place.

Thanks. I've reduced autovacuum_vacuum_scale_factor from 0.2 to 0.05
(and set autovacuum_analyze_scale_factor = 0.05 for good measure)

As I understand it, both of these can run in parallel, and I have 7
cores usually idle, while the other is maxed out.

> But what I think I'd do is change one of your full indexes to contain
> the other column as well, and get rid of the partial index. It might
> not be quite as efficient as the partial index might theoretically be,
> but it should be pretty good and also be less fragile.

I'll try that.

Thanks,

Richard

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Richard Neill 2012-12-22 17:46:30 Re: Why does the query planner use two full indexes, when a dedicated partial index exists?
Previous Message Kevin Grittner 2012-12-21 15:34:13 Re: Slow queries after vacuum analyze