The use of partial, expressional indices in pg < 14

From: Danny Shemesh <dany74q(at)gmail(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: The use of partial, expressional indices in pg < 14
Date: 2022-05-31 10:33:23
Message-ID: CAFZC=QrQ4uahFengvML7nxQC0AoPdLGd2qhdTqFYAy_Ka6AjEg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hey everyone,

There's something I've been wondering about - to my understanding,
the planner won't use statistics collected on partial indices, as they may
or may not reflect the correct distribution of data.

When using expressional indices which are also partial, a-la an index on a
nested path of a jsonb column, or a function, with a where clause -
the planner won't actually use the collected statistics, and as such, the
index may end up unused or misused.

Since postgres 14, it's now possible to create extended statistics on an
expression, thus one could create a statistic on the expression used in the
partial index;
In previous versions, I believe the best one can do is either create a full
index on the expression, so the statistics would be utilized, or to extract
the expression to a dedicated column.

Concretely, this came from an index we've had on a relatively large table
(2TB~) on jsonb->some->>path;
the table serves several types of data, where some->>path is only relevant
for data type A, which is only a fraction of the total rows - thus an index
on type = A made lots of sense.

However, the collected statistics weren't used, even if the query contains
type = A, and thus the usage of the index is a bit random,
it's either utilized when it shouldn't, I believe due to the nature of the
hard-code estimation multipliers for the operators we use, or it's unused
when it would've been appropriate.

Two questions that came to mind were:
1. Are there any other actions one could take in pg < 14 (we're in 12,
specifically), avoiding creating a full index and extracting said fields to
a dedicated column ?
2. Would it be theoretically possible to use the collected statistics if
the index where clause is also specified in the query itself ?
or in other words, if the index only contains records where x is not null,
and the query also filters on x is not null, would the partial distribution
not be safe to use ?

Thank you for your time !
Danny

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Fabien COELHO 2022-05-31 11:51:07 Re: psql 15beta1 does not print notices on the console until transaction completes
Previous Message Shaheed Haque 2022-05-31 10:31:31 Re: Is it possible to index "deep" into a JSONB column?