Re: Pluggable storage

From: Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Pluggable storage
Date: 2017-08-23 05:26:03
Message-ID: CAJrrPGfCLTaD8DOzYqm44TbPYcOicp9eyGA4nDW=R6EN=mrNig@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Aug 15, 2017 at 4:53 PM, Andres Freund <andres(at)anarazel(dot)de> wrote:

> Hi,
>
>
> On 2017-06-13 11:50:27 +1000, Haribabu Kommi wrote:
> > Here I attached WIP patches to support pluggable storage. The patch
> series
> > are may not work individually. Still so many things are under
> development.
> > These patches are just to share the approach of the current development.
>
> Making a pass through the patchset to get a feel where this at, and
> where this is headed. I previously skimmed the thread to get a rough
> sense on what's discused, but not in a very detailed manner.
>

Thanks for the review.

General:
>
> - I think one important discussion we need to have is what kind of
> performance impact we're going to accept introducing this. It seems
> very likely that this'll cause some slowdown. We can kind of
> alleviate that by doing some optimizations at the same time, but
> nevertheless, this abstraction is going to cost.
>

OK. May be to take some decision, we may need some performance
figures, I will measure the performance once the API's stabilized.

- I don't think we should introduce this without a user besides
> heapam. The likelihood that API will be usable by anything else
> without a testcase seems fairly remote. I think some src/test/modules
> type implementation of a per-session, in-memory storage - relatively
> easy to implement - or such is necessary.
>

Sure, I will add a test module once the API's are stabilized.

> - I think, and detailed some of that, we're going to need some cleanups
> that go in before this, to decrease the size / increase the quality of
> the new APIs. It's going to get more painful to change APIs
> subsequently.
>
> - We'll have to document clearly that these APIs are going to change for
> a while, even after the release introducing them.
>

Yes, that's correct, because this is the first time we are developing the
storage API's to support pluggable storage, so it may needs some refinements
based on the usage to support different storage methods.

> StorageAm - Scan stuff:
>
> - I think API isn't quite right. There's a lot of granular callback
> functionality like scan_begin_function / scan_begin_catalog /
> scan_begin_bm - these largely are convenience wrappers around the same
> function, and I don't think that would, or rather should, change in
> any abstracted storage layer. So I think there needs to be some
> unification first (pretty close w/ beginscan_internal already, but
> perhaps we should get rid of a few of these wrappers).
>

OK. I will change the API to add a function to beginscan_internal and
replace
the rest of the functions usage with beginscan_internal. And also there are
many bool flags that are passed to the beginscan_internal, I will try to
optimize
them also.

> - Some of the exposed functionality, e.g. scan_getpage,
> scan_update_snapshot, scan_rescan_set_params looks like it should just
> be excised, i.e. there's no reason for it to exist.
>

Currently these API's are used only in Bitmap and Sample scan's.
These scan methods are fully depends on the heap format. I will
check how to remove these API's.

- Minor: don't think the _function suffix for Storis necessary, just
> makes things long, and every member has it. Besides that, it's also
> easy to misunderstand - for a second I understood
> scan_getnext_function to be about getting the next function...
>

OK. How about adding _hook?

> - Scans are still represented as HeapScanDesc - I don't think that's
> going to fly. Either this needs to be an opaque type (i.e. a struct
> that's not defined, just forward declared), or it needs to be a base
> struct that individual AMs embed in their own structs. Individual AMs
> definitely are going to need different pieces of data.
>

Currently the internal members of the HeapScanDesc are directly used
in many places especially in Bitmap and Sample scan's. I am yet to write
the code in a best way to handle these scan methods and then removing
its usage will be easy.

> Storage AM - tuple stuff:
>
> - tuple_get_{xmin, updated_xid, cmin, itempointer, ctid, heaponly} are
> each individual functions, that seems pretty painful to maintain, and
> v/ likely to just grow and grow. Not sure what the solution is, but
> this seems like a hard sell.
>

OK. How about adding a one API and takes some flags to represent
what type of data that is needed from the tuple and returned the
corresponding
data as void *. The caller must typecast the data to their corresponding
type before use it.

- The three *speculative* functions don't quite seem right to me, nor do
> I understand:
> + *
> + * Setting a tuple's speculative token is a slot-only operation,
> so no need
> + * for a storage AM method, but after inserting a tuple containing
> a
> + * speculative token, the insertion must be completed by these
> routines:
> + */
> I don't see anything related to slots, here?
>

The tuple_set_speculative_token API is not required. Just update the slot
member directly with speculative token is fine and this value is used in
the tuple_insert API to form the tuple with speculative token. Later with
the other two API's the tuple is either finished or aborted.

> Storage AM - slot stuff:
>
>
> - I think there's a number of wrapper functions (slot_getattr,
> slot_getallattrs, getsomeattrs, attisnull) around the same
> functionality - that bloats the API and causes slowdowns. Instead we
> need something like slot_virtualize_tuple(int16 upto), and the rest
> should just be wrappers.
>

OK. I will change accordingly.

> - I think it's wrong to have the slot functionality defined on the
> StorageAm level. That'll cause even more indirect function calls (=>
> slowness), and besides that the TupleTableSlot struct will need
> members for each and every Am.
>
> I think the solution is to instead have an Am level "create slot"
> function, and the returned slot is allocated by the Am, with a base
> member of TupleTableSlot with basically just tts_nvalid, tts_values,
> tts_isnull as members. Those are the only members that can be
> accessed without functions.
>
> Access to the individual functions (say store_tuple) would then be
> direct members of the TupleTableSlot interface. While that costs a bit
> of memory, it removes one indirection from an already performance
> critical path.
>

OK. This will change the structure to hold the minimal members that are
accessed irrespective of storage AM, and rest of data will be a void pointer
that can be accessed by only the storage AM.

- MinimalTuples should be one type of slot for the above, except it's
> not created by an StorageAm but by a function returning a
> TupleTableSlot.
>
> This should remove the need for the slot_copy_min_tuple,
> slot_is_physical_tuple functions.
>

OK.

> - Right now TupleTableSlots are an executor datastructure, but
> these patches (rightly!) make it much more widely used. So I think it
> needs to be moved outside of executor/, and probably renamed to
> something like TupleHolder or something.
>

OK.

> - The oid integration seems wrong - without an accessor oids won't be
> queryable with this unless you break through the API. But from a
> higher level view I do wonder if it's not time to remove "special" oid
> columns and replace them with a normal column. We should be hesitant
> enshrining crusty old concepts in new APIs.
>

OK.

> Executor integration:
>
> - I'm quite fearful that this'll cause slowdowns in a few tight paths.
> The most likely cases here seem to be a) bitmap indexscans b)
> indexscans c) highly selective sequential scans. I do wonder if
> that can be partially addressed by switching out the individual
> executor routines in the relevant scan nodes by something using or
> similar to the infrastructure in cc9f08b6b8
>

Sorry, I didn't understand this point clearly. Can you provide some more
details.

Regards,
Hari Babu
Fujitsu Australia

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Haribabu Kommi 2017-08-23 05:35:04 Re: Pluggable storage
Previous Message Dilip Kumar 2017-08-23 04:15:35 Re: Proposal: Improve bitmap costing for lossy pages