Re: [PATCH] Fix ScalarArrayOpExpr estimation for GIN indexes

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Marti Raudsepp <marti(at)juffo(dot)org>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCH] Fix ScalarArrayOpExpr estimation for GIN indexes
Date: 2011-12-20 05:08:49
Message-ID: 10309.1324357729@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Marti Raudsepp <marti(at)juffo(dot)org> writes:
> Since PostgreSQL 9.1, GIN has new cost estimation code. This code
> assumes that the only expression type it's going to see is OpExpr.
> However, ScalarArrayOpExpr has also been possible in earlier versions.
> Estimating col <op> ANY (<array>) queries segfaults in 9.1 if there's
> a GIN index on the column.

Ugh. I think we subconsciously assumed that ScalarArrayOpExpr couldn't
appear here because GIN doesn't set amsearcharray, but of course that's
wrong.

> (It seems that RowCompareExpr and NullTest clauses are impossible for
> a GIN index -- at least my efforts to find such cases failed)

No, those are safe for the moment --- indxpath.c has a hard-wired
assumption that RowCompareExpr is only usable with btree, and NullTest
is only considered to be indexable if amsearchnulls is set. Still,
it'd likely be better if this code ignored unrecognized qual expression
types rather than Assert'ing they're not there. It's not like the cases
it *does* handle are done so perfectly that ignoring an unknown qual
could be said to bollix the estimate unreasonably.

> Attached is an attempted fix for the issue. I split out the code for
> the extract call and now run that for each array element, adding
> together the average of (partialEntriesInQuals, exactEntriesInQuals,
> searchEntriesInQuals) for each array element. After processing all
> quals, I multiply the entries by the number of array_scans (which is
> the product of all array lengths) to get the total cost.

> This required a fair bit of refactoring, but I tried to follow the
> patterns for OpExpr pretty strictly -- discounting scans over NULL
> elements, returning 0 cost when none of the array elements can match,
> accounting for cache effects when there are multiple scans, etc. But
> it's also possible that I have no idea what I'm really doing. :)

Hmm. I am reminded of how utterly unreadable "diff -u" format is for
anything longer than single-line changes :-( ... but I think I don't
like this refactoring much. Will take a closer look tomorrow.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Nikhil Sontakke 2011-12-20 06:14:29 Re: Review: Non-inheritable check constraints
Previous Message Robert Haas 2011-12-20 05:00:37 Re: JSON for PG 9.2