From: | Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com> |
---|---|
To: | Andres Freund <andres(at)anarazel(dot)de> |
Cc: | Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Pluggable Storage - Andres's take |
Date: | 2018-07-04 10:11:21 |
Message-ID: | CAJrrPGeSycj+JiTZ2M+yX_z457EY3vEP0PzCdfoguO055f=5Wg@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Tue, Jul 3, 2018 at 5:06 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
> Hi,
>
> As I've previously mentioned I had planned to spend some time to polish
> Haribabu's version of the pluggable storage patch and rebase it on the
> vtable based slot approach from [1]. While doing so I found more and
> more things that I previously hadn't noticed. I started rewriting things
> into something closer to what I think we want architecturally.
>
Thanks for the deep review and changes.
> The current state of my version of the patch is *NOT* ready for proper
> review (it doesn't even pass all tests, there's FIXME / elog()s). But I
> think it's getting close enough to it's eventual shape that more eyes,
> and potentially more hands on keyboards, can be useful.
>
I will try to update it to make sure that it passes all the tests and also
try to
reduce the FIXME's.
> The most fundamental issues I had with Haribabu's last version from [2]
> are the following:
>
> - The use of TableTuple, a typedef from void *, is bad from multiple
> fronts. For one it reduces just about all type safety. There were
> numerous bugs in the patch where things were just cast from HeapTuple
> to TableTuple to HeapTuple (and even to TupleTableSlot). I think it's
> a really, really bad idea to introduce a vague type like this for
> development purposes alone, it makes it way too hard to refactor -
> essentially throwing the biggest benefit of type safe languages out of
> the window.
>
My earlier intention was to remove the HeapTuple usage entirely and replace
it with slot everywhere outside the tableam. But it ended up with TableTuple
before it reach to the stage because of heavy use of HeapTuple.
> Additionally I think it's also the wrong approach architecturally. We
> shouldn't assume that a tuple can efficiently be represented as a
> single palloc'ed chunk. In fact, we should move *away* from relying on
> that so much.
>
> I've thus removed the TableTuple type entirely.
>
Thanks for the changes, I didn't check the code yet, so for now whenever the
HeapTuple is required it will be generated from slot?
> - Previous verions of the patchset exposed Buffers in the tableam.h API,
> performed buffer locking / pinning / ExecStoreTuple() calls outside of
> it. That is wrong in my opinion, as various AMs will deal very
> differently with buffer pinning & locking. The relevant logic is
> largely moved within the AM. Bringing me to the next point:
>
>
> - tableam exposed various operations based on HeapTuple/TableTuple's
> (and their Buffers). This all need be slot based, as we can't
> represent the way each AM will deal with this. I've largely converted
> the API to be slot based. That has some fallout, but I think largely
> works. Lots of outdated comments.
>
Yes, I agree with you.
> - I think the move of the indexing from outside the table layer into the
> storage layer isn't a good idea. It lead to having to pass EState into
> the tableam, a callback API to perform index updates, etc. This seems
> to have at least partially been triggered by the speculative insertion
> codepaths. I've reverted this part of the changes. The speculative
> insertion / confirm codepaths are now exposed to tableam.h - I think
> that's the right thing because we'll likely want to have that
> functionality across more than a single tuple in the future.
>
>
> - The visibility functions relied on the *caller* performing buffer
> locking. That's not a great idea, because generic code shouldn't know
> about the locking scheme a particular AM needs. I've changed the
> external visibility functions to instead take a slot, and perform the
> necessary locking inside.
>
When I first moved all the visibility functions as part of tableam, I find
this
problem, and it will be good if it takes of buffer locking and etc.
> - There were numerous tableam callback uses inside heapam.c - that makes
> no sense, we know what the storage is therein. The relevant
>
>
> - The integration between index lookups and heap lookups based on the
> results on a index lookup was IMO too tight. The index code dealt
> with heap tuples, which isn't great. I've introduced a new concept, a
> 'IndexFetchTableData' scan. It's initialized when building an index
> scan, and provides the necessary state (say current heap buffer), to
> do table lookups from within a heap.
>
I agree that it will be good with the new concept from index to access the
heap.
> - The am of relations required for bootstrapping was set to 0 - I don't
> think that's a good idea. I changed it so it's set to the heap AM as
> well.
>
>
> - HOT was encoded in the API in a bunch of places. That doesn't look
> right to me. I tried to improve a bit on that, but I'm not yet quite
> sure I like it. Needs written explanation & arguments...
>
>
> - the heap tableam did a heap_copytuple() nearly everywhere. Leading to
> a higher memory usage, because the resulting tuples weren't freed or
> anything. There might be a reason for doing such a change - we've
> certainly discussed that before - but I'm *vehemently* against doing
> that at the same time we introduce pluggable storage. Analyzing the
> performance effects will be hard enough without changes like this.
>
How about using of slot instead of tuple and reusing of it? I don't know
yet whether it is possible everywhere.
>
> - I've for now backed out the heap rewrite changes, partially. Mostly
> because I didn't like the way the abstraction looks, but haven't quite
> figured out how it should look like.
>
>
> - I did not like that speculative tokens were moved to slots. There's
> really no reason for them to live outside parameters to tableam.h
> functsions.
>
>
> - lotsa additional smaller changes.
>
>
> - lotsa new bugs
>
Thanks for all the changes.
>
> My current working state is at [3] (urls to clone repo are at [4]).
> This is *HEAVILY WIP*. I plan to continue working on it over the next
> days, but I'll temporarily focus onto v11 work. If others want I could
> move repo to github and grant others write access.
>
Yes, I want to access the code and do further development on it.
>
> Tasks / Questions:
>
> - split up patch
>
How about generating refactoring changes as patches first based on
the code in your repo as discussed here[1]?
> - Change heap table AM to not allocate handler function for each table,
> instead allocate it statically. Avoids a significant amount of data
> duplication, and allows for a few more compiler optimizations.
>
Some kind of static variable handlers for each tableam, but need to check
how can we access that static handler from the relation.
> - Merge tableam.h and tableamapi.h and make most tableam.c functions
> small inline functions. Having one-line tableam.c wrappers makes this
> more expensive than necessary. We'll have a big enough trouble not
> regressing performancewise.
>
OK.
> - change scan level slot creation to use tableam function for doing so
> - get rid of slot->tts_tid, tts_tupleOid and potentially tts_tableOid
>
so with this there shouldn't be a way from slot to tid mapping or there
should be some other way.
> - COPY's multi_insert path should probably deal with a bunch of slots,
> rather than forming HeapTuples
>
OK.
- bitmap index scans probably need a new tableam.h callback, abstracting
> bitgetpage()
>
OK.
Regards,
Haribabu Kommi
Fujitsu Australia
From | Date | Subject | |
---|---|---|---|
Next Message | Ashutosh Bapat | 2018-07-04 10:11:35 | Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled. |
Previous Message | Etsuro Fujita | 2018-07-04 10:06:43 | Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled. |