Re: Why will hashed SubPlan not use multiple batches

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Why will hashed SubPlan not use multiple batches
Date: 2013-01-25 21:37:58
Message-ID: 19913.1359149878@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Jeff Janes <jeff(dot)janes(at)gmail(dot)com> writes:
> A hashed SubPlan will not be used if it would need more than one
> batch. Is there a fundamental reason for that, or just that no one
> got around to adding it?

It can't, really. Batching a hash join requires freedom to reorder the
rows on both sides of the join. A SubPlan, by definition, must deliver
the correct answer for the current outer row on-demand.

The only real fix for your problem would be to teach the regular hash
join machinery how to handle NOT IN semantics accurately, so that we
could transform this query into a regular kind of join instead of a
seqscan with a SubPlan wart attached to it. In the past it hasn't
really seemed worth it, since 99% of the time, once you question
somebody about why they're insisting on NOT IN rather than NOT EXISTS,
you find out that they didn't really want NOT IN semantics after all.

We could also consider adding logic to notice NOT NULL constraints on
the inner select's outputs, which would allow the planner to prove that
the query can be transformed to a regular antijoin. That only helps
people who've put on such constraints though ...

> I have no control over the real query itself

Sigh. Another badly-written ORM, I suppose?

regards, tom lane

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2013-01-25 21:48:51 Re: [PATCH] explain tup_fetched/returned in monitoring-stats
Previous Message Bruce Momjian 2013-01-25 21:27:18 Re: WAL_DEBUG logs spurious data