From: | Ashwin Agrawal <aagrawal(at)pivotal(dot)io> |
---|---|
To: | Andres Freund <andres(at)anarazel(dot)de> |
Cc: | PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Rafia Sabih <rafia(dot)pghackers(at)gmail(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Asim R P <apraveen(at)pivotal(dot)io>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru> |
Subject: | Re: Pluggable Storage - Andres's take |
Date: | 2019-05-09 20:34:17 |
Message-ID: | CALfoeitTBvA2Y0nX_Hr4SoqQ7eMoCPGzYCqBs7iyYxNBm66AYQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wed, May 8, 2019 at 2:46 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
>
> Hi,
>
> On 2019-05-07 23:18:39 -0700, Ashwin Agrawal wrote:
> > On Mon, May 6, 2019 at 1:39 PM Ashwin Agrawal <aagrawal(at)pivotal(dot)io> wrote:
> > > Also wish to point out, while working on Zedstore, we realized that
> > > TupleDesc from Relation object can be trusted at AM layer for
> > > scan_begin() API. As for ALTER TABLE rewrite case (ATRewriteTables()),
> > > catalog is updated first and hence the relation object passed to AM
> > > layer reflects new TupleDesc. For heapam its fine as it doesn't use
> > > the TupleDesc today during scans in AM layer for scan_getnextslot().
> > > Only TupleDesc which can trusted and matches the on-disk layout of the
> > > tuple for scans hence is from TupleTableSlot. Which is little
> > > unfortunate as TupleTableSlot is only available in scan_getnextslot(),
> > > and not in scan_begin(). Means if AM wishes to do some initialization
> > > based on TupleDesc for scans can't be done in scan_begin() and forced
> > > to delay till has access to TupleTableSlot. We should at least add
> > > comment for scan_begin() to strongly clarify not to trust Relation
> > > object TupleDesc. Or maybe other alternative would be have separate
> > > API for rewrite case.
> >
> > Just to correct my typo, I wish to say, TupleDesc from Relation object can't
> > be trusted at AM layer for scan_begin() API.
> >
> > Andres, any thoughts on above. I see you had proposed "change the
> > table_beginscan* API so it
> > provides a slot" in [1], but seems received no response/comments that time.
> > [1]
> > https://www.postgresql.org/message-id/20181211021340.mqaown4njtcgrjvr%40alap3.anarazel.de
>
> I don't think passing a slot at beginscan time is a good idea. There's
> several places that want to use different slots for the same scan, and
> we probably want to increase that over time (e.g. for batching), not
> decrease it.
>
> What kind of initialization do you want to do based on the tuple desc at
> beginscan time?
For Zedstore (column store) need to allocate map (array or bitmask) to
mark which columns to project for the scan. Also need to allocate AM
internal scan descriptors corresponding to number of attributes for
the scan. Hence, need access to number of attributes involved in the
scan. Currently, not able to trust Relation's TupleDesc, for Zedstore
we worked-around the same by allocating these things on first call to
getnextslot, when have access to slot (by switching to memory context
used during scan_begin()).
From | Date | Subject | |
---|---|---|---|
Next Message | Shawn Debnath | 2019-05-09 20:54:49 | Re: Adding SMGR discriminator to buffer tags |
Previous Message | Stephen Frost | 2019-05-09 19:24:44 | Re: integrate Postgres Users Authentication with our own LDAP Server |