Re: Why using a partial index is doing slightly more logical I/O than a normal index

From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: "Dirschel, Steve" <steve(dot)dirschel(at)thomsonreuters(dot)com>
Cc: "pgsql-general(at)lists(dot)postgresql(dot)org" <pgsql-general(at)lists(dot)postgresql(dot)org>
Subject: Re: Why using a partial index is doing slightly more logical I/O than a normal index
Date: 2023-05-03 17:37:03
Message-ID: CAMkU=1yJqBB3b4Xr51eUrWWMoJsN0d6G1+8AV+RbVnPZmFgB7A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Because both the actual times and the expected costs are so similar to each
other, I am assuming you are asking this as more of an academic question
than a practical one. If it is actually a practical matter, you should
find a better example to present to us.

On Wed, May 3, 2023 at 9:17 AM Dirschel, Steve <
steve(dot)dirschel(at)thomsonreuters(dot)com> wrote:

> Index Cond: ((workflow_id = 1070) AND ((status)::text = ANY
> ('{NOT_STARTED,PAUSED,PENDING,RUNNING}'::text[])))
>
Buffers: shared hit=33
>

For this usage, the =ANY is a "boundary condition". It re-descends the
index for each value in the array, and each of those re-descents incurs
buffer accesses. They cost very little, as the planner thinks they will
mostly be cached already (and indeed, they are), but the difference still
shows up in the Buffers tabulation.

Index Cond: ((workflow_id = 1070) AND ((status)::text = ANY
> ('{NOT_STARTED,PAUSED,PENDING,RUNNING}'::text[])))
>
> Filter: (deleted_millis <= 0)
>
> Buffers: shared hit=24
>

For this usage, the =ANY is applied as an "in-index filter". It only
descends the index once, to where workflow_id=1070, and then scans forward
applying the =ANY to each index-tuple until it exhausts the =1070
condition. As long as all the =1070 entries fit into just a few buffers,
the count of buffers accessed by doing this is fewer than doing the
re-descents. (Stepping from tuple to tuple in the same index page doesn't
count as a new access. While a re-descent releases and reacquires the
buffer)

This is a bit of speculation on my part, as nothing in the plan output
distinguishes boundary condition usages from in-index-filter usages. This
speculation is based on the fact that I was recently investigating a
similar situation and did extensive work on it with a debugger and by
adding new experimental log messages. Also, It isn't clear to me why it
chooses one usage for one plan and the other usage for the other one in
your case, as it seems that both would be eligible for the "boundary
condition" treatment. But presumably for some reason invisible to us it
just thinks one approach is faster for one index and the other approach for
the other index.

One other thing to note- when using the partial index the cost is .43 ..
> 15824.82. When using the other index the cost is .56 .. 15820.19. So the
> lower end cost (I believe the cost to find the first row) is slightly lower
> for the partial index but the higher end cost (I believe to find the last
> row) is higher for the partial index. Since there is no LIMIT clause why
> wouldn’t the optimizer use the lowest cost to find all rows (which in this
> case would be to use the non-partial index)?
>

There is some code in the planner which deems plans to be tied if their
costs are within a small difference (1%, I think it is). This allows some
branches of the tree of all possible plans to be pruned off early, which
can save a lot of time in planning.

Cheers,

Jeff

>

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Dirschel, Steve 2023-05-03 18:00:26 RE: [EXT] Re: Why using a partial index is doing slightly more logical I/O than a normal index
Previous Message Brainmue 2023-05-03 17:27:35 Re: RHEL repo package crc mismatches