[WIP] Better partial index-only scans

From: Joshua Yanovski <pythonesque(at)gmail(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: [WIP] Better partial index-only scans
Date: 2014-03-16 12:08:01
Message-ID: CABz-M-GrkvrMc9ni5S0mX53rtZg3=SZNEYrU_A8RigQ2b3MGNA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Proof of concept initial patch for enabling index only scans for
partial indices even when an attribute is not in the target list, as
long as it is only used in restriction clauses that can be proved by
the index predicate. This also works for index quals, though they
still can't be used in the target list. However, this patch may be
inefficient since it duplicates effort that is currently delayed until
after the best plan is chosen.

The patch works by basically repeating the logic from
create_indexscan_plan in createplan.c that determines which clauses
can't be discarded, instead of the current approach, which just
assumes that any attributes referenced anywhere in a restriction
clause has to be a column in the relevant index. It should build
against master and passes make check for me. It also includes a minor
fix in the same code in createplan.c to make sure we're explicitly
comparing an empty list to NIL, but I can take that out if that's not
considered in scope. If this were the final patch I'd probably
coalesce the code used in both places into a single function, but
since I'm not certain that the implementation in check_index_only
won't change substantially I held off on that.

Since the original comment suggested that this was not done due to
planner performance concerns, I assume the performance of this
approach is unacceptable (though I did a few benchmarks and wasn't
able to detect a consistent difference--what would be a good test for
this?). As such, this is intended as more of a first pass that I can
build on, but I wanted to get feedback at this stage on where we can
improve (particularly if there were already ideas on how this might be
done, as the comment hints). Index only scans cost less than regular
index scans so I don't think we can get away with waiting until we've
chosen the best plan before we do the work described above. That
said, as I see it performance could improve in any combination of five
ways:
* Improve the performance of determining which clauses can't be
discarded (e.g. precompute some information about equivalence classes
for index predicates, mess around with the order in which we check the
clauses to make it fail faster, switch to real union-find data
structures for equivalence classes).
* Find a cleverer way of figuring out whether a partial index can be
used than just checking which clauses can't be discarded.
* Use a simpler heuristic (that doesn't match what use to determine
which clauses can be discarded, but still matches more than we do
now).
* Take advantage of work we do here to speed things up elsewhere (e.g.
if this does get chosen as the best plan we don't need to recompute
the same information in create_indexscan_plan).
* Delay determining whether to use an index scan or index only scan
until after cost analysis somehow. I'm not sure exactly what this
would entail.

Since this is my first real work with the codebase, I'd really
appreciate it if people could help me figure out the best approach
here (and, more importantly, if one is necessary based on benchmarks).
And while this should go without saying, if this patch doesn't
actually work then please let me know, since all the above is based on
the assumption that what's there is enough :)

Thanks,
Joshua Yanovski

Attachment Content-Type Size
partial_index_only_scans_v1.patch text/plain 5.7 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2014-03-16 12:23:40 Re: jsonb status - 'JsonbValue' has no member named 'size'
Previous Message Greg Stark 2014-03-16 11:58:53 AUTOCOMMIT off + ON_ERROR_ROLLBACK usability