Re: Bad query plan decision when using multiple column index - postgresql uses only first column then filters

From: Cosmin Prund <cprund(at)gmail(dot)com>
To: Pgsql Performance <pgsql-performance(at)lists(dot)postgresql(dot)org>
Subject: Re: Bad query plan decision when using multiple column index - postgresql uses only first column then filters
Date: 2020-01-16 18:15:09
Message-ID: CAGU4dz-jq678=qccivjxJsrfG+jb0H-tLuHnPoJ=6BszDKJikQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hello Michael and hello again Tom, sorry for mailing you directly. I just
hit Reply in gmail - I expected the emails to have a reply-to=Pgsql.
Apparently they do not.

Running the same query with a different "Ver" produces a proper plan.
Here's a non-redacted example (Ver=91):

EXPLAIN (ANALYZE, BUFFERS) select "IdRand", "IdColoana", "Valoare" from
"LucrareBugetDate" where ("LucrareBugetVersiuneId" = 91) and
("LucrareBugetDateId" in (10,11));

QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Index Scan using
"IX_LucrareBugetDate_LucrareBugetVersiuneId_LucrareBugetDateId" on
"LucrareBugetDate" (cost=0.56..4.95 rows=2 width=13) (actual
time=3.617..3.631 rows=2 loops=1)
Index Cond: (("LucrareBugetVersiuneId" = 91) AND ("LucrareBugetDateId" =
ANY ('{10,11}'::integer[])))
Buffers: shared hit=9 read=3
Planning time: 0.223 ms
Execution time: 3.663 ms
(5 rows)

I have reindex everything, not just this INDEX.

"reltuples" for this table is 41712436.

> I'd be curious of the fraction in the MCVs frequency list in stats
indicates that rows with Ver = 92 are rare and therefore the index on
only Ver column is sufficient to find the rows quickly.

There are 25 valid values for "Ver" in this database. I ran the query for
all of them. The only one miss-behaving is "92". I ran the query with
random values for Ver (invalid values), the query plan always attempts to
use the index using both values.
I looked into "most_common_values" in pg_stats, this value (92) is not in
that list.
Finally I ran "ANALYZE" again and now the problem went away. Running the
query with Ver=92 uses the proper plan. I'm not happy with this - I know I
haven't solved the problem (I've ran ANALYZE multiple times before).

On Thu, 16 Jan 2020 at 19:00, Michael Lewis <mlewis(at)entrata(dot)com> wrote:

> Does the behavior change with different values of Ver column? I'd be
> curious of the fraction in the MCVs frequency list in stats indicates that
> rows with Ver = 92 are rare and therefore the index on only Ver column is
> sufficient to find the rows quickly. What is reltuples for this table by
> the way?
>
> I also wonder if the situation may be helped by re-indexing the "index on
> both columns" to remove any chance of issues on bloat in the index. Which
> order are the columns by the way? If Ver is first, is there also an index
> on only id column?. Since you aren't on v12, you don't get to re-index
> concurrently but I assume you know the work around of create concurrently
> (different name), drop concurrently (old one), and finally rename new index.
>

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Laurenz Albe 2020-01-16 18:20:12 Re: Bad query plan decision when using multiple column index - postgresql uses only first column then filters
Previous Message Tom Lane 2020-01-16 17:27:39 Re: Bad query plan decision when using multiple column index - postgresql uses only first column then filters