Re: WHERE col = ANY($1) extended to 2 or more columns?

From: "Peter J(dot) Holzer" <hjp-pgsql(at)hjp(dot)at>
To: pgsql-general(at)lists(dot)postgresql(dot)org
Subject: Re: WHERE col = ANY($1) extended to 2 or more columns?
Date: 2023-02-11 22:00:31
Message-ID: 20230211220031.cdrrb5dnkiu6hden@hjp.at
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On 2023-02-11 16:21:49 +0100, Peter J. Holzer wrote:
> On 2023-02-09 18:35:42 +0100, Dominique Devienne wrote:
> > Right. The goal is to (re)use a prepared statement (i.e. plan once), and bind
> > the RHS (binary) array
> > and do a single exec (single round-trip) to get the matching rows. AFAIK, this
> > is the fastest way.
>
> Planning time is often much less than execution time, so minimizing it
> may not give you the fastest results.
>
> For example I tried to fetch two data points from a medium sized (3 mio
> rows) table ((macrobondtimeseries, date) is the primary key here):
>
> wds=> explain (analyze) select macrobondtimeseries, date, value
> from facttable_oecd_naq_2018
> where (macrobondtimeseries, date) = any(
> array [
> ( 'naq_mex_b1_gi_cqr_a' , '2013-01-01' ),
> ( 'naq_lux_p3s13_gpsa_a' , '1961-01-01' )
> ]::tsd[]
> );
[Sequential scan]
> ║ Planning Time: 1.833 ms
> ║ Execution Time: 1619.729 ms
> ╚════════════════════════════════════════════════════════════════════
> (12 rows)
>
> wds=> explain (analyze) select macrobondtimeseries, date, value
> from facttable_oecd_naq_2018
> where (macrobondtimeseries, date) in
> (
> ( 'naq_mex_b1_gi_cqr_a' , '2013-01-01' ),
> ( 'naq_lux_p3s13_gpsa_a' , '1961-01-01' )
> );
[Bitmap index scan]
> ║ Planning Time: 1.414 ms
> ║ Execution Time: 0.330 ms
> ╚════════════════════════════════════════════════════════════════════
> (10 rows)
>
> The latter is almost 1000 times faster. Saving 1.8 ms on planning time
> doesn't help you if you then waste 1.6 s on execution.
>
> So sometimes it pays off to give the planner a little bit of extra
> information to work on.

And here is a variant which does achieve a good execution
plan with an array:

wds=> explain(analyze) with a as (
select * from unnest (
array [
( 'naq_mex_b1_gi_cqr_a' , '2013-01-01' ),
( 'naq_lux_p3s13_gpsa_a' , '1961-01-01' )
]::tsd[]
)
)
select macrobondtimeseries, date, value
from a natural join facttable_oecd_naq_2018;
╔═════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╗
║ QUERY PLAN ║
╟─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╢
║ Nested Loop (cost=0.56..7.17 rows=1 width=44) (actual time=0.075..0.115 rows=2 loops=1) ║
║ -> Function Scan on unnest (cost=0.00..0.02 rows=2 width=36) (actual time=0.011..0.012 rows=2 loops=1) ║
║ -> Index Scan using facttable_oecd_naq_2018_pkey on facttable_oecd_naq_2018 (cost=0.56..3.57 rows=1 width=34) (actual time=0.045..0.045 rows=1 loops=2) ║
║ Index Cond: (((macrobondtimeseries)::text = (unnest.macrobondtimeseries)::text) AND (date = unnest.date)) ║
║ Planning Time: 1.061 ms ║
║ Execution Time: 0.172 ms ║
╚═════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╝
(6 rows)

(At least with these parameters on this table with this version of
PostgreSQL. Your mileage may vary, of course.)

hp

--
_ | Peter J. Holzer | Story must make more sense than reality.
|_|_) | |
| | | hjp(at)hjp(dot)at | -- Charles Stross, "Creative writing
__/ | http://www.hjp.at/ | challenge!"

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Christian Ramseyer (mx04) 2023-02-12 00:13:52 Re: pg_trgm vs. Solr ngram
Previous Message Peter J. Holzer 2023-02-11 15:21:49 Re: WHERE col = ANY($1) extended to 2 or more columns?