Re: [NOVICE] WHERE clause not used when index is used

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Petr Jelinek <petr(at)2ndquadrant(dot)com>
Cc: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Tobias Florek <postgres(at)ibotty(dot)net>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject: Re: [NOVICE] WHERE clause not used when index is used
Date: 2016-03-01 20:03:39
Message-ID: 23030.1456862619@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-novice

I wrote:
> Petr Jelinek <petr(at)2ndquadrant(dot)com> writes:
>> I can only get the issue when the sort order of the individual keys does
>> not correlate and the operator sorts according to the first column and
>> there are duplicate values for the first column.

> Yeah, I think the combination of ASC and DESC columns may be necessary to
> break things. It needs closer analysis.

Okay, I've now studied the patch more carefully and understood that it
wasn't quite so naive as to not consider multi-column cases at all.
It does work for simple scalar multi-column cases, because the test on
SK_BT_MATCHED is placed so that it only skips checking the first column's
condition, not the conditions on lower-order columns. Rather, the problem
is failure to consider ROW() comparisons, which do not necessarily have
the same simple behavior as scalar comparisons. They sort of accidentally
work, *if* the lower-order columns in the ROW() comparison match the
index's lower-order columns as to position and index direction. Tobias'
example fails because the second column in the ROW() isn't sorted the
same, and you could also make it fail with a 3-or-more-column index in
which the ROW() comparison tested, say, just the first and third columns.

What I think we should do is move the SK_BT_MATCHED flag down to the
first sub-key of the ROW() comparison, which is the one that does behave
the same as in scalar cases.

The fact that it doesn't fail for the case where the lower-order columns
of the ROW() do match the index shows that this still leaves something on
the table: within a ROW() comparison, you could set SK_BT_MATCHED on more
than just the first sub-key, *if* the subsequent columns match the index.
This makes me think that the analysis should be getting done in or near
_bt_mark_scankey_required, not in the rather random place it is now,
because _bt_mark_scankey_required already understands about sub-keys
matching the index or not.

However ... there is an even larger problem, which is that the patch
thinks it can use skey->sk_flags for what's effectively transient state
storage --- not merely transient, but scan-direction-dependent. This
means that a cursor that reads forwards for awhile and then turns around
and reads backwards will get the wrong answer, even without anything as
complicated as multiple key columns. It's a bit harder to exhibit such a
bug than you might guess, because of btree's habit of prefetching an index
page at a time; you have to make the scan cross an index page boundary
before you turn around and back it up. Bearing that in mind, I was soon
able to devise a failure case in the regression database:

regression=# set enable_seqscan TO 0;
SET
regression=# set enable_bitmapscan TO 0;
SET
regression=# begin;
BEGIN
regression=# declare c cursor for select unique1,unique2 from tenk1 where unique1 > 9500;
DECLARE CURSOR
regression=# fetch 20 from c;
unique1 | unique2
---------+---------
9501 | 5916
9502 | 1812
9503 | 1144
9504 | 9202
9505 | 4300
9506 | 5551
9507 | 847
9508 | 4093
9509 | 9418
9510 | 1862
9511 | 848
9512 | 4823
9513 | 1125
9514 | 9276
9515 | 1160
9516 | 5177
9517 | 3600
9518 | 9677
9519 | 5518
9520 | 1429
(20 rows)

regression=# fetch backward 30 from c;
unique1 | unique2
---------+---------
9519 | 5518
9518 | 9677
9517 | 3600
9516 | 5177
9515 | 1160
9514 | 9276
9513 | 1125
9512 | 4823
9511 | 848
9510 | 1862
9509 | 9418
9508 | 4093
9507 | 847
9506 | 5551
9505 | 4300
9504 | 9202
9503 | 1144
9502 | 1812
9501 | 5916
9500 | 3676
9499 | 927
9498 | 2807
9497 | 6558
9496 | 1857
9495 | 1974
9494 | 560
9493 | 3531
9492 | 9875
9491 | 7237
9490 | 8928
(30 rows)

Ooops.

I believe the way to fix this would be to stop regarding SK_BT_MATCHED
as state, and instead treat it as a scankey property identified during
_bt_preprocess_keys, analogously to SK_BT_REQFWD/SK_BT_REQBKWD --- and,
like those, you'd need two flags not one since the properties will be
determined independently of knowing which direction you'll be going in.

In any event, I am now of the opinion that this patch needs to be reverted
outright and returned to the authors for redesign. There are too many
things wrong with it and too little reason to think that we have to have
it in 9.5.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2016-03-01 20:04:40 Re: 2016-03 Commitfest Manager
Previous Message David Steele 2016-03-01 19:39:45 2016-03 Commitfest Manager

Browse pgsql-novice by date

  From Date Subject
Next Message Tom Lane 2016-03-01 21:20:33 Re: [NOVICE] WHERE clause not used when index is used
Previous Message Tom Lane 2016-03-01 18:47:05 Re: [NOVICE] WHERE clause not used when index is used