Re: Fwd: Planner oversight for GIN indices?

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Kurt Kartaltepe <kkartaltepe(at)gmail(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Fwd: Planner oversight for GIN indices?
Date: 2017-08-02 21:47:46
Message-ID: 26525.1501710466@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Kurt Kartaltepe <kkartaltepe(at)gmail(dot)com> writes:
> ... I feel like this is potentially a question more for pgsql-hackers but
> the mailing list suggests asking elsewhere before posting there and
> this wasnt quite a "bug". A quick uninformed peek at the planner code
> makes me think this isn't exactly trivial but from the "simplicity" of
> the workaround id hope it is possible. This seems like an issue that
> would affect all inverse indices or more generally any index where
> multiple clauses against the same index might have different
> performance characteristics that could be determined at plan time (so
> only for constant restrictions).

Yeah, the planner's traditional behavior here is just to throw every
potentially indexable clause into the indexqual list and let the index AM
sort things out at runtime. This is demonstrably the best thing to do
for btree, where the AM itself can identify redundant or contradictory
scan clauses with 100% accuracy at scan start. I think that GIN might
be the only case where including seriously-unselective quals is really
a big loser --- the cause of that being that it might have to iterate
over some very long posting lists. We did some work recently to make
GIN smarter about combinations of long and short posting lists, but it
seems that it's still not smart enough to cover this case completely.

I'm not sure offhand whether it's better to make the planner try to
identify indexable clauses it should not hand to the index AM, or to
insist that the index AM ought to be smart enough to cope. The difficulty
with the former approach is that the planner can't be counted on to do the
right thing if there are non-constant comparison values, plus if there are
a lot of potentially-indexable clauses it's not cheap to consider all
combinations. The difficulty with the latter approach is that the index
AM might not have the necessary information either. If the best it can do
is to look into the planner's statistics, that feels a bit duplicative.
(But it might still be a win, because we'd definitely have the actual
comparison value at runtime.)

Anyway, sorry, this is a research problem rather than something that's
easy to fix.

regards, tom lane

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Peter Koukoulis 2017-08-02 22:42:16 select md5 result set
Previous Message Rob Sargent 2017-08-02 19:48:10 Re: bidirectional mapping?