"memory exhausted" in query parser/simplifier for many nested parentheses

From: Niklas Hambüchen <mail(at)nh2(dot)me>
To: pgsql-bugs(at)lists(dot)postgresql(dot)org
Cc: ruben(at)benaco(dot)com, Niklas Hambüchen <niklas(at)benaco(dot)com>
Subject: "memory exhausted" in query parser/simplifier for many nested parentheses
Date: 2024-12-11 21:59:41
Message-ID: 161892c0-65fc-42a6-8af9-b098879decd4@nh2.me
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Hello,

we noticed that when writing a large query of form

(((A OR B) OR C) OR ...)

with many terms (e.g. 13000) this causes one of two errors:

1. memory exhausted at or near "("
2. stack depth limit exceeded

This seems bugged/unfortunate, as writing

A OR B OR C OR ...

does not cause the errors.
This means the query fails depending whether or not the user provided parentheses in the query.

Note the query doesn't have to involve `OR`, it can be anyting involving many parentheses.

I tracked down the cause of the two errors:

* In the Bison parser:

#define YYMAXDEPTH 10000

http://git.savannah.gnu.org/cgit/bison.git/tree/src/parse-gram.c?h=v3.8.2#n1376
This causes error:
ERROR: memory exhausted at or near \"(\"

* In the expression simplifier:

simplify_function() at clauses.c:3921
calls expression_tree_mutator() at nodeFuncs.c:3117
calls eval_const_expressions_mutator() at clauses.c:2449
calls simplify_function()

This causes error:
stack depth limit exceeded
from
https://github.com/postgres/postgres/blob/REL_14_12/src/backend/tcop/postgres.c#L3481-L3498

A minimal repro for this can be found in:

https://github.com/questdb/questdb/issues/3841#issuecomment-2536599109

(I'm not affiilated with QuestDB, it seems to be a database also using Postgres's parser.)

Repeating the repro here:

#!/usr/bin/env python3

import sys

N = int(sys.argv[1])

exp = ""
for i in range(N):
exp += "lower("

exp += "'ABCDEFGHI'"

for i in range(N):
exp += ")"

assert exp.count("(") == exp.count(")")
print(exp)

Example invocation:

psql -h localhost -p 5432 postgres -c "select $(python3 expression.py 10000); select 1"

For playing with the limits in Postgres, the `YYMAXDEPTH 10000` constant can be changed by manually changing it `gram.c` (generated from `gram.y`).

Both above problems stem from using a recursive descent parser in a language with limited stack size (C, or Bison's manual stack implemented with `goto`).

The Bison parser runs before the expression simplifier.
So if the expression is too large, we get `memory exhausted`.
If the expression is small enough to make it to the simplifer, we get `stack depth limit exceeded` if it's too large for the simplier.
The simplifier stack limit can be adjusted with

max_stack_depth = 2MB (the default)
max_stack_depth = 7680kB (possible maximum)

and this maximum is of course still too low if we have tens of terms.

We consider this a bug because it means one cannot easily generate large queries that query a couple thousand entries in a `SELECT ... WHERE` condition.

It would be great if postgres can alleviate these limits, or at least succeed to parse everything that's written with parentheses equally well when written with parentheses.

Thank you!
Niklas & Ruben

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Thomas Munro 2024-12-11 22:39:35 Re: BUG #18711: Attempting a connection with a database name longer than 63 characters now fails
Previous Message Artur Zakirov 2024-12-11 21:41:16 Re: pg_dump crash on identity sequence with not loaded attributes