Re: Partial index creation always scans the entire table

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Mike Sofen" <msofen(at)runbox(dot)com>
Cc: "'Laurenz Albe'" <laurenz(dot)albe(at)cybertec(dot)at>, "'MingJu Wu'" <mingjuwu0505(at)gmail(dot)com>, pgsql-performance(at)lists(dot)postgresql(dot)org
Subject: Re: Partial index creation always scans the entire table
Date: 2020-02-17 01:09:00
Message-ID: 22504.1581901740@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

"Mike Sofen" <msofen(at)runbox(dot)com> writes:
>> From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> Sent: Sunday, February 16, 2020 7:30 AM
>>> I've seen people try to do this before. I wonder if the manual page about
>>> partial indexes should explicitly say "don't do that".

> Yes please (seriously). The utter beauty of Postgres is the flexibility and
> power that its evolutionary path has allowed/created. The tragic danger is
> that the beauty is fairly easy to misapply/misuse. Caveats in the
> documentation would be very beneficial to both seasoned practitioners and
> newcomers - it is quite challenging to keep up with everything Postgres and
> the documentation is where most of us turn for guidance.

OK, so how about something like this added to section 11.8
(no pretty markup as yet):

Example 11.4. Do Not use Partial Indexes as a Substitute for Partitioning

You might be tempted to create a large set of non-overlapping partial
indexes, for example

CREATE INDEX mytable_cat_1 ON mytable (data) WHERE category = 1;
CREATE INDEX mytable_cat_2 ON mytable (data) WHERE category = 2;
CREATE INDEX mytable_cat_3 ON mytable (data) WHERE category = 3;
...

This is a bad idea! Almost certainly, you'll be better off with a single
non-partial index, declared like

CREATE INDEX mytable_cat_data ON mytable (category, data);

(Put the category column first, for the reasons described in section 11.3
Multicolumn Indexes.) While a search in this larger index might have to
descend through a couple more tree levels than a search in a smaller
index, that's almost certainly going to be cheaper than the planner effort
needed to select the appropriate one of the partial indexes. The core of
the problem is that the system does not understand the relationship among
the partial indexes, and will laboriously test each one to see if it's
applicable to the current query.

If your table is large enough that a single index really is a bad idea,
you should look into using partitioning instead (section whatever-it-is).
With that mechanism, the system does understand that the tables and
indexes are non-overlapping, so much better performance is possible.

regards, tom lane

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Laurenz Albe 2020-02-17 09:53:21 Re: SubtransControlLock and performance problems
Previous Message Justin Pryzby 2020-02-17 00:52:26 Re: Partial index creation always scans the entire table