Re: unoptimized nested loops

From: David Rowley <dgrowleyml(at)gmail(dot)com>
To: Tim Kelly <gtkelly(at)dialectronics(dot)com>
Cc: pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject: Re: unoptimized nested loops
Date: 2022-06-01 21:10:07
Message-ID: CAApHDvqVUpDajMa31Yq-_o4+W7frGLp+6QQD5pJXxCP1oqTOVw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Wed, 1 Jun 2022 at 08:04, Tim Kelly <gtkelly(at)dialectronics(dot)com> wrote:
> -> Seq Scan on data (cost=0.00..47132.93 rows=125 width=135)
> (actual time=0.542..182952.708 rows=1167810 loops=1)
> Filter: (data ~~ '%some text%'::text)

The problem is coming from the 125 row estimate in the above plan
fragment. Because the number of estimated rows is low, the query
planner likely thinks a Nested Loop join is best.

What I'd do first is verify that some other join method is better by
running the query after having done:

SET enable_nestloop TO off;
<run query>
RESET enble_nestloop;

If the query then runs faster then it's going to be worth doing
something about trying to improve those statistics.

I see the like pattern matching selectivity estimation code does look
at histogram buckets, so you might have luck if you increase the
statistics targets on this column:

ALTER TABLE data ALTER COLUMN data SET STATISTICS 1000;
ANALYZE data;

The standard number of buckets is 100. The above will set it to 1000.
You can go as high as 10000, but going too high is going to slow down
the planner, so you should only go as high as you need to go.

David

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Jeff Janes 2022-06-02 04:05:46 Re: unoptimized nested loops
Previous Message Bruce Momjian 2022-06-01 20:08:38 Re: Extended multivariate statistics are ignored (potentially related to high null fraction, not sure)