Avoiding deeply nested AND/OR trees in the parser

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)postgreSQL(dot)org
Subject: Avoiding deeply nested AND/OR trees in the parser
Date: 2014-02-25 18:15:09
Message-ID: 3631.1393352109@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Over in
http://www.postgresql.org/message-id/BAY176-W382A9DE827EBC8E602B7BBC5860@phx.gbl
there's a complaint about getting "stack depth limit exceeded" from a
query of the form

select count(*) from gui_die_summary where (x_coord, y_coord) in
((25,5),(41,13),(25,7),(28,3),(25,8),(34,7),(26,6),(21,10), ...);

when there's a few thousand pairs in the IN list. The reason for this
is that transformAExprIn() generates a tree of nested OR expressions,
and then recursion in assign_collations_walker() runs out of stack space.
(If assign_collations_walker() hadn't failed, something else probably
would later, so it's wrong to blame that function in particular.)

I reproduced this problem in the regression database using generated
queries like so:

print "explain select * from tenk1 where (thousand, tenthous) in (\n";
for ($i = 0; $i < 10000; $i++) {
print ",\n" if $i > 0;
print "($i,$i)";
}
print ");\n";

On my machine, HEAD fails at about 9000 pairs with this test case.

While I commented in the pgsql-novice thread that there are better ways
to do this, it still seems like a limitation we probably ought to fix
if it's not too hard. In the case of transformAExprIn, we could generate
an N-argument OR or AND node instead of a nest; this is already done
for example in make_row_comparison_op(). The attached quick-hack patch
does so, and I verified that the system could handle a million pairs
with this in place. (It takes about 20 seconds and 20GB of memory to
plan such a case, so I still say it's a bad idea, but at least we can
do it.) There is similar code in make_row_distinct_op(), which perhaps
ought to be fixed as well.

However, this isn't exactly the end of the story, because if you were to
dump out a view generated from a query like this, it would contain a long
chain of OR clauses, which would mean that reloading the view would put
you right back in stack overflow territory. Really if we wanted to fix
this issue we'd need to hack up transformAExprAnd/transformAExprOr so that
they recognized nested ANDs/ORs and flattened them on the spot. This
might not be a bad idea, but it's starting to look like more than a quick
hack patch.

Does this seem worth pursuing, or shall we leave it alone?

regards, tom lane

Attachment Content-Type Size
flatten-ORs-in-transformAExprIn.patch text/x-diff 2.8 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2014-02-25 18:21:46 Re: [PATCH] Use MAP_HUGETLB where supported (v3)
Previous Message Bruce Momjian 2014-02-25 17:45:10 Re: jsonb and nested hstore