Re: Allow pageinspect's bt_page_stats function to return a set of rows instead of a single row

From: Hamid Akhtar <hamid(dot)akhtar(at)gmail(dot)com>
To: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
Cc: "Drouvot, Bertrand" <bdrouvot(at)amazon(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Allow pageinspect's bt_page_stats function to return a set of rows instead of a single row
Date: 2022-06-30 08:24:00
Message-ID: CANugjhtG0fygzMfuCCeyqK7dZfvWPRSSzBgSbQfbyNNWMDN9LA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, 27 Jun 2022 at 15:52, Bharath Rupireddy <
bharath(dot)rupireddyforpostgres(at)gmail(dot)com> wrote:

> On Mon, Jun 27, 2022 at 1:40 PM Drouvot, Bertrand <bdrouvot(at)amazon(dot)com>
> wrote:
> >
> > Hi,
> >
> > On 6/27/22 9:31 AM, Hamid Akhtar wrote:
> >
> >
> > Hello Hackers,
> >
> > While working on one of my blogs on the B-Tree indexes, I needed to look
> at a range of B-Tree page statistics. So the goto solution was to use
> pageinspect. However, reviewing stats for multiple pages meant issuing
> multiple queries.
>
> +1 to improve the API.
>
> > I felt that there's an opportunity for improvement in the extension by
> extending the API to output the statistics for multiple pages with a single
> query.
> >
> > That attached patch is based on the master branch. It makes the
> following changes to the pageinspect contrib module:
> > - Updates bt_page_stats_internal function to accept 3 arguments instead
> of 2.
> > - The function now uses SRF macros to return a set rather than a single
> row. The function call now requires specifying column names.
> >
> > The extension version is bumped to 1.11 (PAGEINSPECT_V1_11).
> > To maintain backward compatibility, for versions below 1.11, the
> multi-call mechanism is ended to keep the old behavior consistent.
> >
> > Regression test cases for the module are updated as well as part of this
> change. Here is a subset of queries that are added to the btree.sql test
> case file for pageinspect.
> >
> > ----
> > CREATE TABLE test2 AS (SELECT generate_series(1, 5000) AS col1);
> > CREATE INDEX test2_col1_idx ON test2(col1);
> > SELECT * FROM bt_page_stats('test2_col1_idx', 1, 2);
> >
> > For example, this could be written as:
> >
> > select * from
> > generate_series(1, 2) blkno ,
> > bt_page_stats('test2_col1_idx',blkno::int);
> >
> > Or, if one wants to inspect to whole relation, something like:
> >
> > select * from
> > generate_series(1, pg_relation_size('test2_col1_idx'::regclass::text) /
> 8192 - 1) blkno ,
> > bt_page_stats('test2_col1_idx',blkno::int);
>
> Good one. But not all may know the alternatives.

+1

> Do we have any
> difference in the execution times for the above query vs the new
> function introduced in the v1 patch? If there's not much difference, I
> would suggest adding an SQL function around the generate_series
> approach in the pageinspect extension for better and easier usability.
>

Based on some basic SQL execution time comparison of the two approaches, I
see that the API change, on average, is around 40% faster than the SQL.

CREATE TABLE test2 AS (SELECT generate_series(1, 5000000) AS col1);
CREATE INDEX test2_col1_idx ON test2(col1);

EXPLAIN ANALYZE
SELECT * FROM bt_page_stats('test2_col1_idx', 1, 5000);

EXPLAIN ANALYZE
SELECT * FROM GENERATE_SERIES(1, 5000) blkno,
bt_page_stats('test2_col1_idx',blkno::int);

For me, the API change returns back the data in around 74ms whereas the SQL
returns it in 102ms. So considering this and as you mentioned, the
alternative may not be that obvious to everyone, it is a fair improvement.

>
> Regards,
> Bharath Rupireddy.
>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kyotaro Horiguchi 2022-06-30 08:38:01 Re: fix stats_fetch_consistency value in postgresql.conf.sample
Previous Message Heikki Linnakangas 2022-06-30 08:21:18 Re: Strange failures on chipmunk