From: | Alexander Korotkov <aekorotkov(at)gmail(dot)com> |
---|---|
To: | Melanie Plageman <melanieplageman(at)gmail(dot)com> |
Cc: | Andres Freund <andres(at)anarazel(dot)de>, Robert Haas <robertmhaas(at)gmail(dot)com>, Pavel Borisov <pashkin(dot)elfe(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Michael Paquier <michael(at)paquier(dot)xyz>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Nazir Bilal Yavuz <byavuz81(at)gmail(dot)com> |
Subject: | Re: Table AM Interface Enhancements |
Date: | 2024-04-13 09:28:38 |
Message-ID: | CAPpHfdujpyy-FcrCM=0qjXWZm3Pc9hfwBuvAqpLjT847DXOamg@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi, Melanie!
On Fri, Apr 12, 2024 at 8:48 PM Melanie Plageman
<melanieplageman(at)gmail(dot)com> wrote:
>
> On Thu, Apr 11, 2024 at 6:04 PM Alexander Korotkov <aekorotkov(at)gmail(dot)com> wrote:
> >
> > On Fri, Apr 12, 2024 at 12:04 AM Andres Freund <andres(at)anarazel(dot)de> wrote:
> > > On 2024-04-11 20:46:02 +0300, Alexander Korotkov wrote:
> > > > I hope this work is targeting pg18.
> > >
> > > I think anything of the scope discussed by Melanie would be very clearly
> > > targeting 18. For 17, I don't know yet whether we should revert the the
> > > ANALYZE streaming read user (041b96802ef), just do a bit of comment polishing,
> > > or some other small change.
> > >
> > > One oddity is that before 041b96802ef, the opportunities for making the
> > > interface cleaner were less apparent, because c6fc50cb4028 increased the
> > > coupling between analyze.c and the way the table storage works.
> >
> > Thank you for pointing this out about c6fc50cb4028, I've missed this.
> >
> > > > Otherwise, do I get this right that this post feature-freeze works on
> > > > designing a new API? Yes, 27bc1772fc masked the problem. But it was
> > > > committed on Mar 30.
> > >
> > > Note that there were versions of the patch that were targeting the
> > > pre-27bc1772fc interface.
> >
> > Sure, I've checked this before writing. It looks quite similar to the
> > result of applying my revert patch [1] to the head.
> >
> > Let me describe my view over the current situation.
> >
> > 1) If we just apply my revert patch and leave c6fc50cb4028 and
> > 041b96802ef in the tree, then we get our table AM API narrowed. As
> > you expressed the current API requires block numbers to be 1:1 with
> > the actual physical on-disk location [2]. Not a secret I think the
> > current API is quite restrictive. And we're getting the ANALYZE
> > interface narrower than it was since 737a292b5de. Frankly speaking, I
> > don't think this is acceptable.
> >
> > 2) Pushing down the read stream and prefetch to heap am is related to
> > difficulties [3], [4]. That's quite a significant piece of work to be
> > done post FF.
>
> I had operated under the assumption that we needed to push the
> streaming read code into heap AM because that is what we did for
> sequential scan, but now that I think about it, I don't see why we
> would have to. Bilal's patch pre-27bc1772fc did not do this. But I
> think the code in acquire_sample_rows() isn't more tied to heap AM
> after 041b96802ef than it was before it. Are you of the opinion that
> the code with 041b96802ef ties acquire_sample_rows() more closely to
> heap format?
Yes, I think so. Table AM API deals with TIDs and block numbers, but
doesn't force on what they actually mean. For example, in ZedStore
[1], data is stored on per-column B-trees, where TID used in table AM
is just a logical key of that B-trees. Similarly, blockNumber is a
range for B-trees.
c6fc50cb4028 and 041b96802ef are putting to acquire_sample_rows() an
assumption that we are sampling physical blocks as they are stored in
data files. That couldn't anymore be some "logical" block numbers
with meaning only table AM implementation knows. That was pointed out
by Andres [2]. I'm not sure if ZedStore is alive, but there could be
other table AM implementations like this, or other implementations in
development, etc. Anyway, I don't feel good about narrowing the API,
which is there from pg12.
Links.
1. https://www.pgcon.org/events/pgcon_2020/sessions/session/44/slides/13/Zedstore-PGCon2020-Virtual.pdf
2. https://www.postgresql.org/message-id/20240410212117.mxsldz2w6htrl36v%40awork3.anarazel.de
------
Regards,
Alexander Korotkov
From | Date | Subject | |
---|---|---|---|
Next Message | Alexander Korotkov | 2024-04-13 10:04:58 | Re: Add SPLIT PARTITION/MERGE PARTITIONS commands |
Previous Message | Tomas Vondra | 2024-04-13 09:19:21 | Re: Parallel CREATE INDEX for BRIN indexes |