From: | Niklas Hambüchen <mail(at)nh2(dot)me> |
---|---|
To: | pgsql-bugs(at)lists(dot)postgresql(dot)org |
Cc: | ruben(at)benaco(dot)com, Niklas Hambüchen <niklas(at)benaco(dot)com> |
Subject: | "memory exhausted" in query parser/simplifier for many nested parentheses |
Date: | 2024-12-11 21:59:41 |
Message-ID: | 161892c0-65fc-42a6-8af9-b098879decd4@nh2.me |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs |
Hello,
we noticed that when writing a large query of form
(((A OR B) OR C) OR ...)
with many terms (e.g. 13000) this causes one of two errors:
1. memory exhausted at or near "("
2. stack depth limit exceeded
This seems bugged/unfortunate, as writing
A OR B OR C OR ...
does not cause the errors.
This means the query fails depending whether or not the user provided parentheses in the query.
Note the query doesn't have to involve `OR`, it can be anyting involving many parentheses.
I tracked down the cause of the two errors:
* In the Bison parser:
#define YYMAXDEPTH 10000
http://git.savannah.gnu.org/cgit/bison.git/tree/src/parse-gram.c?h=v3.8.2#n1376
This causes error:
ERROR: memory exhausted at or near \"(\"
* In the expression simplifier:
simplify_function() at clauses.c:3921
calls expression_tree_mutator() at nodeFuncs.c:3117
calls eval_const_expressions_mutator() at clauses.c:2449
calls simplify_function()
This causes error:
stack depth limit exceeded
from
https://github.com/postgres/postgres/blob/REL_14_12/src/backend/tcop/postgres.c#L3481-L3498
A minimal repro for this can be found in:
https://github.com/questdb/questdb/issues/3841#issuecomment-2536599109
(I'm not affiilated with QuestDB, it seems to be a database also using Postgres's parser.)
Repeating the repro here:
#!/usr/bin/env python3
import sys
N = int(sys.argv[1])
exp = ""
for i in range(N):
exp += "lower("
exp += "'ABCDEFGHI'"
for i in range(N):
exp += ")"
assert exp.count("(") == exp.count(")")
print(exp)
Example invocation:
psql -h localhost -p 5432 postgres -c "select $(python3 expression.py 10000); select 1"
For playing with the limits in Postgres, the `YYMAXDEPTH 10000` constant can be changed by manually changing it `gram.c` (generated from `gram.y`).
Both above problems stem from using a recursive descent parser in a language with limited stack size (C, or Bison's manual stack implemented with `goto`).
The Bison parser runs before the expression simplifier.
So if the expression is too large, we get `memory exhausted`.
If the expression is small enough to make it to the simplifer, we get `stack depth limit exceeded` if it's too large for the simplier.
The simplifier stack limit can be adjusted with
max_stack_depth = 2MB (the default)
max_stack_depth = 7680kB (possible maximum)
and this maximum is of course still too low if we have tens of terms.
We consider this a bug because it means one cannot easily generate large queries that query a couple thousand entries in a `SELECT ... WHERE` condition.
It would be great if postgres can alleviate these limits, or at least succeed to parse everything that's written with parentheses equally well when written with parentheses.
Thank you!
Niklas & Ruben
From | Date | Subject | |
---|---|---|---|
Next Message | Thomas Munro | 2024-12-11 22:39:35 | Re: BUG #18711: Attempting a connection with a database name longer than 63 characters now fails |
Previous Message | Artur Zakirov | 2024-12-11 21:41:16 | Re: pg_dump crash on identity sequence with not loaded attributes |