Re: Use streaming read API in pgstattuple.

From: Kirill Reshke <reshkekirill(at)gmail(dot)com>
To: Nazir Bilal Yavuz <byavuz81(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Use streaming read API in pgstattuple.
Date: 2024-11-28 06:13:08
Message-ID: CALdSSPgvSe2gEpBzs9q1h1BHLG0oewyS2+i5M_2cjaQOtfV4hw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, 26 Nov 2024 at 15:39, Nazir Bilal Yavuz <byavuz81(at)gmail(dot)com> wrote:
>
> Hi,
>
> Thank you for working on this!
>
> On Mon, 25 Nov 2024 at 21:17, Kirill Reshke <reshkekirill(at)gmail(dot)com> wrote:
> > While reviewing other threads implementing stream API for various core
> > subsystems, I spotted that pgstattuple could also benefit from that.
> > So, PFA.
> >
> > Notice refactoring around pgstat_hash_page and changes in pgstat_page
> > signature. This is needed because pgstat_hash_tuple uses
> > _hash_getbuf_with_strategy rather than ReadBufferExtended, which is
> > OK, but makes adapting the streaming API impossible. So, I changed
> > this place. Old codes should behave exactly the same way as new ones.
> > I eliminated the additional check that _hash_getbuf_with_strategy
> > performed, which was blkno == P_NEW. However, since the streaming API
> > already delivers the right buffer, I think it's okay.
>
> I agree that this looks okay.
>
> > This change behaves sanely on my by-hand testing.
>
> I encountered some problems while playing with your patch. This query
> [1] fails when the patch is applied although CI passes, it wasn't
> failing before. I looked at the code and found that:
>
> === 1
>
> blkno = start;
> for (;;)
> {
> /* Get the current relation length */
> LockRelationForExtension(rel, ExclusiveLock);
> nblocks = RelationGetNumberOfBlocks(rel);
> UnlockRelationForExtension(rel, ExclusiveLock);
>
> I think you need to set p.last_exclusive to nblocks here.
>
> === 2
>
> - for (; blkno < nblocks; blkno++)
> + while(BufferIsValid(buf = read_stream_next_buffer(stream, NULL)))
> {
> CHECK_FOR_INTERRUPTS();
>
> - pagefn(&stat, rel, blkno, bstrategy);
> + pagefn(&stat, rel, buf);
> }
>
> blkno doesn't get increased now and this causes an infinite loop.
>
> ===
>
> Also, since this problem couldn't be found by the CI, you may want to
> add a test for the pgstat_index function.
>
> [1] CREATE EXTENSION pgstattuple; create table test (a int primary
> key, b int[]); create index test_hashidx on test using hash (b);
> select pgstattuple(oid) from pg_class where relname = 'test_hashidx';
>
> --
> Regards,
> Nazir Bilal Yavuz
> Microsoft

Hello! Thank you for taking a peek. Your review comments have been
corrected. Since my changes were wrong, I honestly don't know why this
worked in version 1. By a miracle.

As for CI, i rechecked v1:

```
db2=#
select * from pgstathashindex('test_hashidx');
version | bucket_pages | overflow_pages | bitmap_pages | unused_pages
| live_items | dead_items | free_percent
---------+--------------+----------------+--------------+--------------+------------+------------+--------------
4 | 4 | 0 | 1 | 0
| 0 | 0 | 100
(1 row)

db2=#
select * from pgstattuple('test_hashidx');
ERROR: could not read blocks 6..16 in file "base/16454/16473": read
only 0 of 90112 bytes
```

In v2 this behaves correctly.

Should we add `pgstattuple(...)` after `pgstat*type*index(..)`
everywhere in pgstattuple regression tests?

P.S.
I want to mention something that I forgot in my initial email: I
reused codes from here https://commitfest.postgresql.org/50/5327/ with
some adaptation changes. So, Andrey is the author of this patch too.

--
Best regards,
Kirill Reshke

Attachment Content-Type Size
v2-0001-Use-stream-read-interface-for-pgstattuple-routine.patch application/octet-stream 5.3 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Sutou Kouhei 2024-11-28 06:16:17 Re: Make COPY format extendable: Extract COPY TO format implementations
Previous Message David Rowley 2024-11-28 06:11:18 Re: Remove useless GROUP BY columns considering unique index