Re: Perfomance of IN-clause with many elements and possible solutions

From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>, Dmitry Lazurkin <dilaz03(at)gmail(dot)com>, "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Re: Perfomance of IN-clause with many elements and possible solutions
Date: 2017-07-31 17:53:19
Message-ID: CAMkU=1xjPzenssKktcX3pqTt9EpJd6nECAHLxRg8_fgGc9VT_Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Mon, Jul 24, 2017 at 3:46 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wr
>
>
> regression=# EXPLAIN (ANALYZE, BUFFERS) SELECT count(*) FROM ids WHERE id
> IN
> :values_clause;
> QUERY PLAN
> ------------------------------------------------------------
> -----------------------------------------------------------------------
> Aggregate (cost=245006.16..245006.17 rows=1 width=8) (actual
> time=3550.581..3550.581 rows=1 loops=1)
> Execution time: 3550.700 ms
>
>

>
> regression=# set enable_hashagg TO 0;
> regression=# set enable_sort TO 0;
> SET
> regression=# EXPLAIN (ANALYZE, BUFFERS) SELECT count(*) FROM ids WHERE id
> IN
> :values_clause;
> QUERY PLAN
> ------------------------------------------------------------
> -------------------------------------------------------------------
> Aggregate (cost=320003.90..320003.91 rows=1 width=8) (actual
> time=3548.364..3548.364 rows=1 loops=1)
> Execution time: 3548.463 ms
>
>

> At least in this example, the actual runtimes are basically identical
> regardless, so there is no great point in sweating over it.
>

Since The run times are equal, but one is estimated to be 30% more
expensive, I think there is at least some little reason to sweat over it.

Incidentally, I accidentally ran this against a server running with your
patch from
https://www.postgresql.org/message-id/10078.1471955305@sss.pgh.pa.us. On
that server, it did choose the semi-join. But I have no idea why, as it
seems like the effect of that patch would have been to change the distinct
estimate from the magic hard-coded 200, to the natural 200 coming from the
query itself. Why would that affect the cost?

Cheers,

Jeff

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Jeff Janes 2017-07-31 18:03:32 Re: Perfomance of IN-clause with many elements and possible solutions
Previous Message Ed Behn 2017-07-31 17:52:29 Partitioned TEMP tables