Re: "memory exhausted" in query parser/simplifier for many nested parentheses

From: Niklas Hambüchen <mail(at)nh2(dot)me>
To: Greg Sabino Mullane <htamfids(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-bugs(at)lists(dot)postgresql(dot)org, ruben(at)benaco(dot)com, Niklas Hambüchen <niklas(at)benaco(dot)com>
Subject: Re: "memory exhausted" in query parser/simplifier for many nested parentheses
Date: 2024-12-13 17:06:43
Message-ID: a14f012f-a05d-4a19-855c-36a5d512e438@nh2.me
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Hi Greg,

thanks for testing the parser behaviour of "x in (A, B, ...)", it is useful to know that.

> queries this large are not going to be caused by checkboxes being clicked.

This is how I encountered it:
Doing a selection of 10k files (equivalent to the Shift-click range-selection in most GUI file managers), and using them in a query.

> We cannot simply push the parser to accept anything, regardless of the impact on other parts of the system.

I'm not suggesting to remove limits regardless of impact.
There are good limits and bad limits:

I think the happy path is when programs scale to the users' machines and tasks, constrained by limits that achieve specific goals.
Many of postgres's limits from https://www.postgresql.org/docs/17/runtime-config-resource.html achieve the goal of not having queries blow up unreasonably.
They are good limits, set to sensible defaults by developers, and tunable by users to fit their workloads.
Same for most of Linux's limits, such as `ulimit`.

In contrast, the impact of `YYMAXDEPTH` on postgres looks like a historical accident.

* It was set to `YMAXDEPTH 500` by Richard Stallman in _1988_ in Bison commit f791019c "entered into RCS" (thus probably even older).
* Then changed to `YYMAXDEPTH 10000` in 1993.

So we're looking at a > 35 years old hardcode; a bad limit that probably should always have been a runtime-configurable parameter.

Maybe in 1993 "10000 of something" was a huge limit, but it isn't today.
Most modern parsers today do not hardcode key aspects such as nesting depth, because what's "a lot" of course depends on the application, and it often isn't 4 or 500 or 10000.
I also doesn't fit well into what Postgres usually delivers:
As you showed, postgres by default supports "millions of things" in queries, and it really feels like a surprise bug that this limit is much lower when you write the query in an equally long but slightly different way, because that specific syntax code path goes through a library with hardcodes from 1993.

So I think the ideal solution would be:

* Make `YYMAXDEPTH` configurable at run-time (it always should have been), and add a postgres config for it with a good default.

Of course it's not for me to judge whether this is a high priority topic for postgres, but I think it would be fair to classify it as a "one of the tools we use hardcodes a silly limit that doesn't fit to our other limits and thus triggers user surprise" bug, as opposed to "postgres authors designed it this way".

Greetings,
Niklas

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message David G. Johnston 2024-12-13 17:43:03 Re: "memory exhausted" in query parser/simplifier for many nested parentheses
Previous Message Tom Lane 2024-12-13 16:08:38 Re: "memory exhausted" in query parser/simplifier for many nested parentheses