Re: "memory exhausted" in query parser/simplifier for many nested parentheses

From: Tomas Vondra <tomas(at)vondra(dot)me>
To: Greg Sabino Mullane <htamfids(at)gmail(dot)com>, Niklas Hambüchen <mail(at)nh2(dot)me>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-bugs(at)lists(dot)postgresql(dot)org, ruben(at)benaco(dot)com, Niklas Hambüchen <niklas(at)benaco(dot)com>
Subject: Re: "memory exhausted" in query parser/simplifier for many nested parentheses
Date: 2024-12-13 15:11:00
Message-ID: cb8fa16c-0141-4569-8599-d009e0f1d96d@vondra.me
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On 12/13/24 15:44, Greg Sabino Mullane wrote:
> On Fri, Dec 13, 2024 at 8:53 AM Niklas Hambüchen <mail(at)nh2(dot)me
> <mailto:mail(at)nh2(dot)me>> wrote:
>
> As a user, how can I know that 10000 list entries in "x in (A,
> B, ...)" will not also hit an arbitrary parser implementation detail
> hardcoded limit?
> How shall the user have confidence that the parser is better at
> handling multiple commas than multiple parens?
>
>
> As an application developer, you test it, especially if your application
> has the potential to generate insanely large queries. For the record,
> the catch point on my system using Postgres 17 seems to be around 8.4
> *MILLION* items for IN() lists:
>
> greg=# \set VERBOSITY terse
> greg=# do $$begin execute format('SELECT 1 WHERE 1 IN (%s1)',
> repeat('123,',8_385_000));end$$;
> DO
> greg=# do $$begin execute format('SELECT 1 WHERE 1 IN (%s1)',
> repeat('123,',8_390_000));end$$;
> ERROR:  invalid memory alloc request size 1073741824
>  
>
> None of that seems to be documented anywhere
>
>
> Documentation patches are always welcome. Perhaps at https://
> www.postgresql.org/docs/current/limits.html <https://www.postgresql.org/
> docs/current/limits.html> ?
>

FWIW I don't think it's practical to document such limits in detail.
Everyone knows the limits exist, and there's a lot of them - various
libraries we rely on (and we have plenty of them) may have a couple of
them, etc. It'd be a huge effort, and it's not clear where to draw the
line. And I'm not aware of other projects documenting such stuff in
detail - e.g. kernel certainly has a lot of such limits.

> In absence of such docs, one must build systems that later fail at
> arbitrary limits when e.g. the user clicks some larger number of
> checkboxes that construct a batch query.
>
>
> That's a bit of a strawman argument: queries this large are not going to
> be caused by checkboxes being clicked.
>

Not sure I understand the suggestion that "one must build systems that
later fail at arbitrary limits". It's a trade off, and it's perfectly
reasonable to not spend time optimizing for weirdly complex queries when
there's a much better/faster way to write the query, not hitting those
limits ...

> If I build some workaround today, e.g. splitting the query into
> multiple ones of max length N, how do I know it will still work in
> the future, e.g. if Postgres changes the Bison version or switches
> to a different parser?
>
>
> You don't, that's the nature of software. But it's fairly unlikely we'd
> switch to something that was MORE contraining than in the past. Still,
> don't play on the edge.
>  

Right. A lot of the mailing list traffic is discussions about possible
regressions. But even with that we have little control over changes in
the dependencies. This just means it's important to set baselines and do
testing as part of an upgrade.

>
> The hardcodedness of arbitrary small limits that don't scale itself
> is also a problem:
> One cannot configure postgres to allow queries that the hardware is
> perfectly able of handling.
>
> It would be a bit like GCC giving up to compile a file if it
> contains more than 10000 words.
>
>
> No, that's not an apt comparison at all. We cannot simply push the
> parser to accept anything, regardless of the impact on other parts of
> the system. Software engineering is all about tradeoffs. I agree with
> your point about documentation, but this seems like trying to pick a fight.
>

Yeah, it's a matter of trade offs. Not just technical ones, but also
(and maybe in the first place) economical - which improvements to spend
time on to get the most benefit. Time is a finite resource.

regards

--
Tomas Vondra

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Tom Lane 2024-12-13 16:08:38 Re: "memory exhausted" in query parser/simplifier for many nested parentheses
Previous Message Greg Sabino Mullane 2024-12-13 14:44:26 Re: "memory exhausted" in query parser/simplifier for many nested parentheses