Re: Adding skip scan (including MDAM style range skip scan) to nbtree

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Tomas Vondra <tomas(at)vondra(dot)me>
Cc: Masahiro(dot)Ikeda(at)nttdata(dot)com, pgsql-hackers(at)lists(dot)postgresql(dot)org, Masao(dot)Fujii(at)nttdata(dot)com
Subject: Re: Adding skip scan (including MDAM style range skip scan) to nbtree
Date: 2024-09-19 19:22:46
Message-ID: CAH2-Wz=iuqL1KbEwsUKmB1ucByTjRboLxqtiG+OtDa9RHud4ZQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Sep 16, 2024 at 6:05 PM Tomas Vondra <tomas(at)vondra(dot)me> wrote:
> For example, one of the slowed down queries is query 702 (top of page 8
> in the PDF). The query is pretty simple:
>
> explain (analyze, timing off, buffers off)
> select id1,id2 from t_1000000_1000_1_2
> where NOT (id1 in (:list)) AND (id2 = :value);
>
> and it was executed on a table with random data in two columns, each
> with 1000 distinct values.

I cannot recreate this problem using the q702.sql repro you provided.
Feels like I'm missing a step, because I find that skip scan wins
nicely here.

> This is perfectly random data, so a great
> match for the assumptions in costing etc.

FWIW, I wouldn't say that this is a particularly sympathetic case for
skip scan. It's definitely still a win, but less than other cases I
can imagine. This is due to the relatively large number of rows
returned by the scan. Plus 1000 distinct leading values for a skip
array isn't all that low, so we end up scanning over 1/3 of all of the
leaf pages in the index.

BTW, be careful to distinguish between leaf pages and internal pages
when interpreting "Buffers:" output with the patch. Generally
speaking, the patch repeats many internal page accesses, which needs
to be taken into account when compare "Buffers:" counts against
master. It's not uncommon for 3/4 or even 4/5 of all index page hits
to be for internal pages with the patch. Whereas on master the number
of internal page hits is usually tiny. This is one reason why the
additional context provided by "Index Searches:" can be helpful.

> But with uncached data, this runs in ~50 ms on master, but takes almost
> 200 ms with skipscan (these timings are from my laptop, but similar to
> the results).

Even 50ms seems really slow for your test case -- with or without my
patch applied.

Are you sure that this wasn't an assert-enabled build? There's lots of
extra assertions for the code paths used by skip scan for this, which
could explain the apparent regression.

I find that this same query takes only ~2.056 ms with the patch. When
I disabled skip scan locally via "set skipscan_prefix_cols = 0" (which
should give me behavior that's pretty well representative of master),
it takes ~12.039 ms. That's exactly what I'd expect for this query: a
solid improvement, though not the really enormous ones that you'll see
when skip scan is able to avoid reading many of the index pages that
master reads.

--
Peter Geoghegan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Nathan Bossart 2024-09-19 19:45:04 Re: Partitioned tables and [un]loggedness
Previous Message Masahiko Sawada 2024-09-19 18:49:18 Re: Conflict detection for update_deleted in logical replication