From: | Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru> |
---|---|
To: | Andres Freund <andres(at)anarazel(dot)de> |
Cc: | pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> |
Subject: | Re: TupleTableSlot abstraction |
Date: | 2018-02-25 21:38:52 |
Message-ID: | CAPpHfdtpQEU8O4V5x+ryNjFXv2QxWT8dxioxYm=2q5WB5bKG0g@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi!
Thank you for working on this subject. See some my comments below.
On Wed, Feb 21, 2018 at 1:43 AM, Andres Freund <andres(at)anarazel(dot)de> wrote:
> - TupleTableSlots have to contain all the necessary information for each
> type of slot. Some implementations might require a buffer pin to be
> hold (our heap), others pointers to multiple underlying points of data
> (columns store), some need additional metadata (our MinimalTuple).
>
> Unifying the information into one struct makes it much harder to
> provide differing implementations without modifying core postgres
> and/or increasing overhead for every user of slots.
>
> I think the consequence of that is that we need to allow each type of
> slot to have their own TupleTableSlot embedding struct.
>
+1, I think that it's reasonable to keep minimal common slot information in
TupleTableSlot struct and to let particular slot implementations to extend
it.
Haribabu's patch solved this by adding a tts_storage parameter that
> contains additional data, but imo that's unconvincing due to the added
> indirection in very performance critical codepaths.
>
+1
- The way slots currently are implemented each new variant data is
> stored in slots adds new branches to hot code paths like
> ExecClearTuple(), and they're not extensible.
>
> Therefore I think we need to move to callback based implementation. In
> my tests, by moving the callback invocations into very thin inline
> functions, the branch prediction accuracy actually goes sligthly
> *up*.
>
Sounds interesting. Besides branch prediction accuracy, what can you
say about overall performance?
- Currently we frequently convert from one way of representing a row
> into another, in the same slot. We do so fairly implicilty in
> ExecMaterializeSlot(), ExecFetchSlotMinimalTuple(), ExecCopySlot()...
>
> The more we abstract specific storage representation away, the worse I
> think that is. I think we should make such conversions explicit by
> requiring transfers between two slots if a specific type is required.
>
>
> - We have a few codepaths that set fields on tuples that can't be
> represented in virtual slots. E.g. postgres_fdw sets ctid, oid,
> ExecProcessReturning() will set tableoid.
>
>
> In my POC I turned TupleTableSlot into basically an abstract base class:
> struct TupleTableSlot
> {
> NodeTag type;
>
> uint16 flags;
> uint16 nvalid; /* # of valid values in tts_values
> */
>
> const TupleTableSlotOps *const cb;
>
> TupleDesc tupleDescriptor; /* slot's tuple descriptor
> */
>
> Datum *values; /* current per-attribute values */
> bool *nulls; /* current per-attribute null
> flags */
>
> /* can we optimize away? */
> MemoryContext mcxt; /* slot itself is in this context
> */
> };
>
> which then is inherited from for specific implementations of tuple
> slots:
>
> typedef struct HeapTupleTableSlot
> {
> TupleTableSlot base;
> HeapTuple tuple; /* physical tuple */
> uint32 off; /* saved state for
> slot_deform_tuple */
> } HeapTupleTableSlot;
>
> ...
> typedef struct MinimalTupleTableSlot
> {
> TupleTableSlot base;
> HeapTuple tuple; /* tuple wrapper */
> MinimalTuple mintuple; /* minimal tuple, or NULL if none */
> HeapTupleData minhdr; /* workspace for minimal-tuple-only case */
> uint32 off; /* saved state for
> slot_deform_tuple */
> } MinimalTupleTableSlot;
>
>
> typedef struct TupleTableSlotOps
> {
> void (*init)(TupleTableSlot *slot);
> void (*release)(TupleTableSlot *slot);
>
> void (*clear)(TupleTableSlot *slot);
>
> void (*getsomeattrs)(TupleTableSlot *slot, int natts);
>
> void (*materialize)(TupleTableSlot *slot);
>
> HeapTuple (*get_heap_tuple)(TupleTableSlot *slot);
> MinimalTuple (*get_minimal_tuple)(TupleTableSlot *slot);
>
> HeapTuple (*copy_heap_tuple)(TupleTableSlot *slot);
> MinimalTuple (*copy_minimal_tuple)(TupleTableSlot *slot);
>
> } TupleTableSlotOps;
>
> when creating a slot one has to specify the type of slot one wants to
> use:
>
> typedef enum TupleTableSlotType
> {
> TTS_TYPE_VIRTUAL,
> TTS_TYPE_HEAPTUPLE,
> TTS_TYPE_MINIMALTUPLE,
> TTS_TYPE_BUFFER
> } TupleTableSlotType;
>
> extern TupleTableSlot *MakeTupleTableSlot(TupleDesc desc,
> TupleTableSlotType st);
>
Can user define his own slot type? If so, I assume that hard-coded
enum is a limitation of current prototype, and eventually we would have
an extendable array of slot types. Is it correct?
You might notice that I've renamed a few fields (tts_values -> values,
> tts_isnull -> nulls, tts_nvalid -> nvalid) that haven't actually changed
> in the above structs / the attached patch. I now think that's probably
> not a good idea, unnecessarily breaking more code than necessary.
>
For me "tss_" prefix looks a bit weird, but I agree that we probably should
refrain from renaming this in order to break as small number of things as
possible.
> I generally like this scheme because it allows the TupleTableStruct to
> be relatively lean, without knowing much about specific implementations
> of slots.
>
> After this change functions like slot_getattr are thin inline wrappers:
> +static inline Datum
> +slot_getattr(TupleTableSlot *slot, int attnum, bool *isnull)
> +{
> + Assert(attnum > 0);
> +
> + if (attnum > slot->nvalid)
> + slot->cb->getsomeattrs(slot, attnum);
> +
> + Assert(attnum <= slot->nvalid);
> +
> + *isnull = slot->nulls[attnum - 1];
> + return slot->values[attnum - 1];
> +}
>
> Note that I've moved system attribute handling out of the slot_getattr
> path - there were a few paths accessing them via slot_getattr, and it's
> a lot of unnecessary branching.
>
>
> The fact that slots of different types have different storage means that
> we can't just change the type of a slot when needed. In nearly all
> places that's not a problem. E.g. ExecHashTableInsert() converts the
> "incoming" slot to one containing a minimal tuple with
> ExecFetchSlotMinimalTuple(), but that's essentially transient as it's
> just used to copy the tuple into the hashtable. We could even gain
> quite some efficiency by directly copying into the allocated space
> (saving one serialization/copy step for the common case where input is
> either a virtual or a heap tuple).
>
> The slightly bigger issue is that several parts of the code currently
> require heaptuples they can scribble upon. E.g. nodeModifyTable.c
> materializes the tuple - those can be solved by using a local slot to
> store the tuple, rather than using the incoming one. In nearly all cases
> we'd still be left with the same number of tuple copy steps, as
> materializing the tuple with copy into the local storage anyway. A few
> other cases, e.g. intorel_receive(), just can use ExecCopySlotTuple() or
> such instead of materialize.
>
>
> The biggest issue I don't have a good answer for, and where I don't like
> Haribabu's solution, is how to deal with system columns like oid, ctid,
> tableoid. Haribabu's patch adds storage of those to the slot, but that
> seems like quite the cludge to me. It appears to me that the best thing
> would be to require projections for all accesses to system columns, at
> the original level of access. Then we don't need any sort of special
> codepaths dealing with them. But that's not exactly a trivial change and
> has some added complications too (junkfilter type things).
>
I also think that projections for accessing system columns is probably
the best solution for them.
> I made a reference to vectorized execution above. That's because that I
> found, while prototyping vectorized execution, that it's a very common
> need to interact with parts of the system that aren't vectorized, and
> doing so has to be extremely cheap. With the above we can have a
> TupleTableSlot type that's essentially just a cheap cursor into a batch
> of tuples. Besides making it efficiently possible to hand of slots to
> part of the system that can't deal with tuple batches, it allows to do
> fun things like having the first slot_deform_tuple() call for one tuple
> in a batch to deform a full page's worth of tuples, yielding *much*
> better cache access patterns.
>
Are you planning to use custom slots only for interaction between
vectorized and non-vectorized parts of executor, or also for interaction
inside vectorized part of executor. It seems that proposed interface
isn't suitable for interaction between two vectorized nodes. Are you
planning to introduce special kind of vectorized slots or whatever?
I haven't done that, but I think we should split ExecStoreTuple() into
> multiple versions, that only work on specific types of tuple
> slots. E.g. seqscan et al would call ExecStoreBufferHeapTuple(), other
> code dealing with tuples would call ExecStoreHeapTuple(). The relevant
> callers already need to know what kind of tuple they're dealing with,
> therefore that's not a huge burden.
>
>
> Please please note that the attached patch is *NOT* meant to be a full
> proposal or even a to be reviewed patch, it's just an illustration of
> the concepts I'm talking about.
>
> Comments? Better ideas?
>
As I get, your primary interest in this topic is vectorized execution. When
passing data from vectorized to non-vectorized parts of executor tuples are
materialized by moving them from vectorized slot to regular slot.
But what do you thing about alternative tuple formats in custom row-oriented
table access methods? It's likely we should try to minimize tuple format
conversion. Thus, some executor nodes can switch their tuple format to
match table access method tuple format. For instance, we're running
aggregate over table with custom table access method. Then hash
aggregate node may accumulate tuples in the same format as they are
received from table access method. Does it make sense?
------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
From | Date | Subject | |
---|---|---|---|
Next Message | Tomas Vondra | 2018-02-25 21:38:58 | Re: Online enabling of checksums |
Previous Message | Tomas Vondra | 2018-02-25 21:26:05 | Re: Online enabling of checksums |