Quick Links

Re: Extremely inefficient merge-join

From:	Marcin Gozdalik <gozdal(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	pgsql-performance(at)postgresql(dot)org
Subject:	Re: Extremely inefficient merge-join
Date:	2021-03-17 21:27:18
Message-ID:	CADu1mROz6ZTspcHWgTZKi8zmS64wVy+QV-Wx_38NJFSX9QrA0A@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-performance

dir_current changes often, but is analyzed after significant changes, so
effectively it's analyzed probably once an hour.
The approximate ratio of rows with volume_id=5 to the whole number of rows
doesn't change (i.e. volume_id=5 will appear roughly in 1.5M-2M rows, total
is around 750-800M rows).
dir_process is created once, analyzed and doesn't change later.

Assuming dir_process is the outer side in plans shown here has only
duplicates - i.e. all rows have volume_id=5 in this example.
Do you think there is anything that could be changed with the query itself?
Any hints would be appreciated.

śr., 17 mar 2021 o 20:47 Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> napisał(a):

> Marcin Gozdalik <gozdal(at)gmail(dot)com> writes:
> > Sometimes Postgres will choose very inefficient plan, which involves
> > looping many times over same rows, producing hundreds of millions or
> > billions of rows:
>
> Yeah, this can happen if the outer side of the join has a lot of
> duplicate rows. The query planner is aware of that effect and will
> charge an increased cost when it applies, so I wonder if your
> statistics for the tables being joined are up-to-date.
>
> regards, tom lane
>

--
Marcin Gozdalik

In response to

Re: Extremely inefficient merge-join at 2021-03-17 20:47:35 from Tom Lane

Browse pgsql-performance by date

	From	Date	Subject
Next Message	Manish Lad	2021-03-18 09:14:23	How do we hint a query to use index in postgre
Previous Message	Tom Lane	2021-03-17 20:47:35	Re: Extremely inefficient merge-join