Re: How bad is using queries with thousands of values for operators IN or ANY?

From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Thorsten Schöning <tschoening(at)am-soft(dot)de>
Cc: "pgsql-general(at)postgresql(dot)org >> PG-General Mailing List" <pgsql-general(at)postgresql(dot)org>
Subject: Re: How bad is using queries with thousands of values for operators IN or ANY?
Date: 2020-08-31 08:51:50
Message-ID: CAFj8pRAWNM5DVeNrDTvUevTEo6Qc8wt6SwkTuKpdk6bkENhXiQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

po 31. 8. 2020 v 10:04 odesílatel Thorsten Schöning <tschoening(at)am-soft(dot)de>
napsal:

> Hi all,
>
> I have lots of queries in which I need to restrict access to rows
> using some decimal row-ID and am mostly doing so with using the
> operator IN in WHERE-clauses. Additionally I'm mostly embedding the
> IDs as ","-seperated list into the query directly, e.g. because I
> already hit a limitation of ~32k parameters of the JDBC-driver[1] for
> Postgres.
>
> I really thought that in most cases simply sending a large amount of
> IDs embedded into the query is better than looping, because it safes
> roundtrips to access the DB, the planner of the DB has all pieces of
> information it needs to decide best strategies etc. OTOH, with recent
> tests and an increased number of IDs of about factor 100, I have
> additional load in Tomcat before actually sending the query to the DB
> already and in the DB itself as well of course. I've attached an
> example query and plan.
>
> > -> Hash (cost=242592.66..242592.66 rows=6825 width=39) (actual
> time=91.117..91.117 rows=40044 loops=3)
> > Buckets: 65536 (originally 8192) Batches: 1 (originally 1)
> Memory Usage: 3016kB
> > -> Hash Join (cost=137.57..242592.66 rows=6825 width=39) (actual
> time=10.194..82.412 rows=40044 loops=3)
> > Hash Cond: (meter.meter_bcd = meter_bcd.id)
> > -> Index Scan using pk_meter on meter
> (cost=0.42..242237.10 rows=40044 width=25) (actual time=9.350..71.276
> rows=40044 loops=3)
> > Index Cond: (id = ANY ('{[...]}'::integer[]))
> > -> Hash (cost=135.73..135.73 rows=113 width=22) (actual
> time=0.830..0.830 rows=113 loops=3)
>
> Do you know of any obvious limitations of the JDBC-driver of handling
> such large queries? In the end, the query is mostly large text with
> only very few bind parameters.
>
> Do you know of any obvious problem in Postgres itself with that query,
> when parsing it or alike? Do things simply take how long they take and
> are mostly comparable to looping or is there some additional overhead
> the larger the query itself gets? From my naive expectation, comparing
> IDs shouldn't care if things get looped or transmitted at once.
>
> I'm just trying to collect some input for where to look at to optimize
> things in the future. Thanks!
>

It is not good - it increases the memory necessary for query parsing,
optimizer and executor are slower.

Postgres currently has not any optimization for processing searching in
these long lists - so this search is very slow against other methods.

I think this is a signal so something in the design database or
architecture is wrong. Sure, there can be exception, but the Postgres has
not any optimization for this design

Regards

Pavel

> [1]: https://github.com/pgjdbc/pgjdbc/issues/90
>
> Mit freundlichen Grüßen,
>
> Thorsten Schöning
>
> --
> Thorsten Schöning E-Mail: Thorsten(dot)Schoening(at)AM-SoFT(dot)de
> AM-SoFT IT-Systeme http://www.AM-SoFT.de/
>
> Telefon...........05151- 9468- 55
> Fax...............05151- 9468- 88
> Mobil..............0178-8 9468- 04
>
> AM-SoFT GmbH IT-Systeme, Brandenburger Str. 7c, 31789 Hameln
> AG Hannover HRB 207 694 - Geschäftsführer: Andreas Muchow

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Thorsten Schöning 2020-08-31 09:16:19 Re: How bad is using queries with thousands of values for operators IN or ANY?
Previous Message Laurenz Albe 2020-08-31 08:23:54 Re: How bad is using queries with thousands of values for operators IN or ANY?