Re: Helping planner to chose sequential scan when it improves performance

From: Ruslan Zakirov <ruslan(dot)zakirov(at)gmail(dot)com>
To: "Patrick O'Toole" <patrick(dot)otoole(at)sturdy(dot)ai>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Helping planner to chose sequential scan when it improves performance
Date: 2023-06-14 23:00:12
Message-ID: CAMOxC8sJEVujzyToR_wgeAERZ1AJDhU4fs92xY6MOUw0mGPqYA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-performance

On Tue, Jun 13, 2023 at 10:28 PM Patrick O'Toole <patrick(dot)otoole(at)sturdy(dot)ai>
wrote:

> Hi all,
>

> Questions:
>
> 1. In Plan A, what factors are causing the planner to select a
> substantially slower plan despite having recent stats about number of rows?
>
> Estimated overall cost. For Plan A it is ~200k. For plans B/C (haven't
noticed any differences in these two) it is ~250k. The planner uses a less
expensive plan.

Also, in the plans you can see that Pg estimates the number of rows
correctly.

>
> 1. Is there a substantial difference between the on-the-fly hash done
> in Plan B and Plan C compared to the hash-index used in Plan A? Can I
> assume they are essentially the same? Perhaps there are there differences
> in how they're applied?
>
> I don't see any difference in plans B and C, but you report timing
changes. To me this looks like just a fluctuation in measurements. So I
wouldn't trust any measurements for plan A either.

I'm not a big expert, but can not say that plan A and B are essentially the
same.

Plan A: DB scans item_text table and for every record looks into the index
of conversation_item table, then looks into the table itself.

Plan B/C: DB scans conversation_item table without looking into its indexes
building a hash table on the fly.

> 1. Is it common to see values for random_page_cost set as high as 8.0?
> We would of course need to investigate whether we see a net positive or net
> negative impact on other queries, to adopt this as a general setting, but
> is it a proposal we should actually consider?
>
> No idea.

>
> 1. Maybe we are barking up the wrong tree with the previous questions.
> Are there other configuration parameters we should consider first to
> improve performance in situations like the one illustrated?
>
> Recheck your numbers.

>
> 1. Are there other problems with our schema, query, or plans shown
> here? Other approaches (or tools/analyses) we should consider?
>
> You can try the following index:

CREATE INDEX conversation_item_ruz1 ON conversation_item(item_uuid,
conversation_uuid, tenant_id);

I believe this index would allow Pg to use "index only scan" as variation
of Plan A and avoid touching the conversation_item table completely.

--
Best regards, Ruslan.

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Karsten Hilbert 2023-06-15 18:15:46 Re: Reproducing incorrect order with order by in a subquery
Previous Message Ruslan Zakirov 2023-06-14 21:58:55 Re: Reproducing incorrect order with order by in a subquery

Browse pgsql-performance by date

  From Date Subject
Next Message James Pang (chaolpan) 2023-06-15 08:32:41 RE: Postgresql equal join on function with columns not use index
Previous Message Patrick O'Toole 2023-06-13 19:24:51 Helping planner to chose sequential scan when it improves performance