Re: Index Skip Scan

From: Dmitry Dolgov <9erthalion6(at)gmail(dot)com>
To: Floris Van Nee <florisvannee(at)optiver(dot)com>
Cc: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Jesper Pedersen <jesper(dot)pedersen(at)redhat(dot)com>, Alexander Kuzmenkov <a(dot)kuzmenkov(at)postgrespro(dot)ru>, Peter Geoghegan <pg(at)bowt(dot)ie>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Bhushan Uparkar <bhushan(dot)uparkar(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>, James Coleman <jtc331(at)gmail(dot)com>
Subject: Re: Index Skip Scan
Date: 2019-06-01 10:28:28
Message-ID: CA+q6zcU=8OEKw8bc2DHsf7xSFqaT7gOjc-_nbX8FO1N4eizQHg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> On Sat, Jun 1, 2019 at 6:10 AM Floris Van Nee <florisvannee(at)optiver(dot)com> wrote:
>
> After some talks with Jesper at PGCon about the Index Skip Scan, I started
> testing this patch, because it seems to have great potential in speeding up
> many of our queries (great conference by the way, really enjoyed my first
> time being there!). I haven't looked in depth to the code itself, but I
> focused on some testing with real data that we have.

Thanks!

> Actually I'd like to add something to this. I think I've found a bug in the
> current implementation. Would someone be able to check?
>
> The following query uses an index skip scan and returns just 1 row (incorrect!)
>
> select distinct on (market, feedcode) market, feedcode
> from streams.base_price
> where market='TEST'
>
> The following query still uses the regular index scan and returns many more
> rows (correct)
> select distinct on (market, feedcode) *
> from streams.base_price
> where market='TEST'

Yes, good catch, I'll investigate. Looks like in the current implementation
something is not quite right, when we have this order of columns in an index
and where clause (e.g. in the examples above everything seems fine if we create
index over (feedcode, market) and not the other way around).

> As far as I can see, there are two main problems with that at the moment.
>
> 1) Only support for Index-Only scan at the moment, not for regular index
> scans. This was already mentioned upthread and I can understand that it
> was left out until now to constrain the scope of this. However, if we were
> to support 'distinct on' + selecting columns that are not part of the
> index we need a regular index scan instead of the index only scan.

Sure, it's something I hope we can tackle as the next step.

> select distinct feedcode from prices -- approx 10ms
>
> select distinct feedcode from prices where updated_at <= '1999-01-01 00:00' -- approx 200ms
>
> Both use the index skip scan, but the first one is very fast, because it can
> skip large parts of the index. The second one scans the full index, because
> it never finds any row that matches the where condition so it can never skip
> anything.

Interesting, I'll take a closer look.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Chapman Flack 2019-06-01 12:53:35 Re: PostgreSQL vs SQL/XML Standards
Previous Message Rafia Sabih 2019-06-01 10:03:30 Re: Index Skip Scan