Re: Showing primitive index scan count in EXPLAIN ANALYZE (for skip scan and SAOP scans)

From: Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Showing primitive index scan count in EXPLAIN ANALYZE (for skip scan and SAOP scans)
Date: 2024-08-27 21:02:52
Message-ID: CAEze2WhQLi=7eFehX5=_frkvxT60+jfBke__p1O6bVDotgaV+Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, 16 Aug 2024 at 00:34, Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
>
> On Thu, Aug 15, 2024 at 5:47 PM Matthias van de Meent
> <boekewurm+postgres(at)gmail(dot)com> wrote:
> > > > I'm asking, because
> > > > I'm not very convinced that 'primitive scans' are a useful metric
> > > > across all (or even: most) index AMs (e.g. BRIN probably never will
> > > > have a 'primitive scans' metric that differs from the loop count), so
> > > > maybe this would better be implemented in that framework?
> > >
> > > What do you mean by "within that framework"? They seem orthogonal?
> >
> > What I meant was putting this 'primitive scans' info into the
> > AM-specific explain callback as seen in the latest patch version.
>
> I don't see how that could work. This is fundamentally information
> that is only known when the query has fully finished execution.

If the counter was put into the BTScanOpaque, rather than the
IndexScanDesc, then this could trivially be used in an explain AM
callback, as IndexScanDesc and ->opaque are both still available while
building the explain output. As a result, it wouldn't bloat the
IndexScanDesc for other index AMs who might not be interested in this
metric.

> Again, this is already something that we track at the whole-table
> level, within pg_stat_user_tables.idx_scan. It's already considered
> index AM agnostic information, in that sense.

That's true, but for most indexes there is a 1:1 relationship between
loops and idx_scan counts, with ony btree behaving differently in that
regard. Not to say it isn't an important insight for btree, but just
that it seems to be only relevant for btree and no other index I can
think of right now.

> > > It's true that BRIN index scans will probably never show more than a
> > > single primitive index scan. I don't think that the same is true of
> > > any other index AM, though. Don't they all support SAOPs, albeit
> > > non-natively?
> >
> > Not always. For Bitmap Index Scan the node's functions can allow
> > non-native SAOP support (it ORs the bitmaps), but normal indexes
> > without SAOP support won't get SAOP-functionality from the IS/IOS
> > node's infrastructure, it'll need to be added as Filter.
>
> Again, what do you want me to do about it? Almost anything is possible
> in principle, and can be implemented without great difficulty. But you
> have to clearly say what you want, and why you want it.

I don't want anything, or anything done about it, but your statement
that all index AMs support SAOP (potentially non-natively) is not
true, as the non-native SAOP support is only for bitmap index scans,
and index AMs aren't guaranteed to support bitmap index scans (e.g.
pgvector's IVFFLAT and HNSW are good examples, as they only support
amgettuple).

> Yeah, non-native SAOP index scans are always bitmap scans. In the case
> of GIN, there are only lossy/bitmap index scans, anyway -- can't see
> that ever changing.

GIN had amgettuple-based index scans until the fastinsert path was
added, and with some work (I don't think it needs to be a lot) the
feature can probably be returned to the AM. The GIN internals would
probably only need relatively few changes, as they already seem to
mostly use precise TID-based scans - the only addition would be a
filter that prohibits returning tuples that were previously returned
while scanning the fastinsert path during the normal index scan.

> > And, in this case,
> > the use case seems quite index-specific, at least for IS/IOS nodes.
>
> I disagree. It's an existing concept, exposed in system views, and now
> in EXPLAIN ANALYZE. It's precisely that -- nothing more, nothing less.

To be precise, it is not precisely that, because it's a different
counter that an AM must update when the pgstat data is updated if it
wants the explain output to reflect the stats counter accurately. When
an AM forgets to update one of these metrics (or fails to realize they
have to both be updated) then they'd be out-of-sync. I'd prefer if an
AM didn't have to account for it's statistics in more than one place.

> > This made me notice that you add a new metric that should generally be
> > exactly the same as pg_stat_all_indexes.idx_scan (you mention the
> > same).
>
> I didn't imagine that that part was subtle.

It wasn't, but it was not present in the first two paragraphs of the
mail, which I had only skimmed when I sent my first reply (as you
maybe could see indicated by the quote). That's why it took me until
my second reply to realise these were considered to be equivalent,
especially after I noticed the headerfile changes where you added a
new metric rather than pulling data from existing stats.

> > Can't you pull that data, instead of inventing a new place
> > every AMs needs to touch for it's metrics?
>
> No. At least not in a way that's scoped to a particular index scan.

Similar per-node counter data is pulled for the global (!) counters of
pgBufferUsage, so why would it be less possible to gather such metrics
for just one index's stats here? While I do think it won't be easy to
find a good way to integrate this into EXPLAIN's Instrumentation, I
imagine other systems (e.g. table scans) may benefit from a better
integration and explanation of pgstat statistics in EXPLAIN, too. E.g.
I'd love to be able to explain how many times which function was
called in a plans' projections, and what the relevant time expendature
for those functions is in my plans. This data is available with
track_functions enabled, and diffing in the execution nodes should
allow this to be shown in EXPLAIN output. It'd certainly be more
expensive than not doing the analysis, but I believe that's what
EXPLAIN options are for - you can show a more detailed analysis at the
cost of increased overhead in the plan execution.

Alternatively, you could update the patch so that only the field in
IndexScan would need to be updated by the index AM by making the
executor responsible to update the relation's stats at once at the end
of the query with the data from the IndexScanDesc.

Kind regards,

Matthias van de Meent
Neon (https://neon.tech)

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Borisov 2024-08-27 21:06:24 Re: Invalid "trailing junk" error message when non-English letters are used
Previous Message Nathan Bossart 2024-08-27 21:01:58 Re: Partitioned tables and [un]loggedness