Quick Links

Re: Perfomance of IN-clause with many elements and possible solutions

From:	Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	"David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>, Dmitry Lazurkin <dilaz03(at)gmail(dot)com>, "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject:	Re: Perfomance of IN-clause with many elements and possible solutions
Date:	2017-07-31 17:53:19
Message-ID:	CAMkU=1xjPzenssKktcX3pqTt9EpJd6nECAHLxRg8_fgGc9VT_Q@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general

On Mon, Jul 24, 2017 at 3:46 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wr
>
>
> regression=# EXPLAIN (ANALYZE, BUFFERS) SELECT count(*) FROM ids WHERE id
> IN
> :values_clause;
> QUERY PLAN
> ------------------------------------------------------------
> -----------------------------------------------------------------------
> Aggregate (cost=245006.16..245006.17 rows=1 width=8) (actual
> time=3550.581..3550.581 rows=1 loops=1)
> Execution time: 3550.700 ms
>
>

>
> regression=# set enable_hashagg TO 0;
> regression=# set enable_sort TO 0;
> SET
> regression=# EXPLAIN (ANALYZE, BUFFERS) SELECT count(*) FROM ids WHERE id
> IN
> :values_clause;
> QUERY PLAN
> ------------------------------------------------------------
> -------------------------------------------------------------------
> Aggregate (cost=320003.90..320003.91 rows=1 width=8) (actual
> time=3548.364..3548.364 rows=1 loops=1)
> Execution time: 3548.463 ms
>
>

> At least in this example, the actual runtimes are basically identical
> regardless, so there is no great point in sweating over it.
>

Since The run times are equal, but one is estimated to be 30% more
expensive, I think there is at least some little reason to sweat over it.

Incidentally, I accidentally ran this against a server running with your
patch from
https://www.postgresql.org/message-id/10078.1471955305@sss.pgh.pa.us. On
that server, it did choose the semi-join. But I have no idea why, as it
seems like the effect of that patch would have been to change the distinct
estimate from the magic hard-coded 200, to the natural 200 coming from the
query itself. Why would that affect the cost?

Cheers,

Jeff

In response to

Re: Perfomance of IN-clause with many elements and possible solutions at 2017-07-24 22:46:44 from Tom Lane

Browse pgsql-general by date

	From	Date	Subject
Next Message	Jeff Janes	2017-07-31 18:03:32	Re: Perfomance of IN-clause with many elements and possible solutions
Previous Message	Ed Behn	2017-07-31 17:52:29	Partitioned TEMP tables