From: | Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com> |
---|---|
To: | Luc Vlaming <luc(at)swarm64(dot)com> |
Cc: | Michael Paquier <michael(at)paquier(dot)xyz>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Paul Guo <guopa(at)vmware(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Simon Riggs <simon(at)2ndquadrant(dot)com> |
Subject: | Re: Multi Inserts in CREATE TABLE AS - revived patch |
Date: | 2020-11-27 07:46:04 |
Message-ID: | CALj2ACUSpjR5vUNX=Z+Y8Vp3xxQ1LYVKwqAPOZNQi5_OUOkv0w@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Fri, Nov 27, 2020 at 12:22 PM Luc Vlaming <luc(at)swarm64(dot)com> wrote:
>
> >
> > I wonder, how can the do_multi_insert() API decide on when to flush, I
> > mean, based on the number of pages to flush? Do we need to pass the
> > maximum number of pages the buffered tuples can occupy and track the
> > pages currently buffered tuples occupy to decide when to flush? Or is
> > it something that the existing table AM infrastructure already
> > supports? If we use the number of pages to decide on when to flush,
> > how well it works with parallel inserts?
>
> I was assuming each tableam to use its own logic, based on its needs and
> the tradeoffs a storage engine might want to provide. This does not mean
> it should not consider outside parameters, like the aforementioned
> memory usage.
> I think it would imply that each tableam implements its own tracking
> mechanism for how much has accumulated, how, and when to flush, because
> they might track different statistics. IMHO given that each tableam
> anyway would want to implement its own logic on how to store a slot into
> a page, tracking the logic for tracking these statistics seemed minor to
> me. Maybe I missed some parts that should be extracted out to a generic
> interface however?
>
Agree with you that tracking page level or some other info is
dependent on table am implementations.
>
> Some examples of why a tableam could decide on its own on when to flush:
> - the current heap implementation could accumulate a few pages (say up
> to 64) and thereby limit the amount of calls to write() and limit the
> accompanying blocks/context switches. This would also then make the
> writes more sequential wrt the processes which can help with the
> flushing I presume, like how the sequential scan was optimized to
> process a consequtive set of blocks per worker (see
> table_block_parallelscan_nextpage).
> - something like zheap could accumulate data based on the amount of
> columns so that a page with column data is completely filled, thereby
> limiting the write amplification.
> - something that would implement an lsm storage might accumulate a full
> in-memory level before flushing it out.
>
Thanks for the details.
>
> >> One thing I'm wondering is in which memory context the slots end up
> >> being allocated. I'd assume we would want to keep the slots around
> >> between flushes. If they are in the temporary context this might prove
> >> problematic however?
> >>
> >
> > I should not have used the word temporary, it actually is not
> > temporary. This memory conext will be created in begin_multi_insert(),
> > all the buffered tuples are copied using this context, it will be
> > reset at the end of each flush and reused. It can get destroyed at the
> > end in end_multi_insert(). I think we should even do this with the new
> > APIs implementation.
> >
>
> Okay. But in which context are the slots themselves allocated then?
> Because if we allocate the slots themselves in the context and then
> reset with each flush we would have to also re-allocate the slots every
> flush, which seems wasteful to me?
>
Buffer slots are allocated in the memory context in which the new APIs
get called. We don't have to re-allocate the slots every time after
flushing, but we have to clear them using ExecClearTuple() and reuse.
And the memory context I specified in the MultiInsertStateData
structure is for using table_multi_insert() inside the new
do_multi_insert API after we decide to flush. There's a comment in the
existing table_multi_insert() usage in copy code, which says that
table_multi_insert() may leak the memory, for the same reason we need
that temporary memory context, which gets set just before
table_multi_insert(), and reset after that. This happens for each
batch of tuples. And in the end this context can be deleted in the
end_multi_insert API.
Hope this helps.
I'm planning to summarize and post the new APIs description here again
for other opinions.
With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com
From | Date | Subject | |
---|---|---|---|
Next Message | Bharath Rupireddy | 2020-11-27 07:47:38 | Re: Use standard SIGHUP and SIGTERM handlers in autoprewarm module |
Previous Message | Luc Vlaming | 2020-11-27 07:37:56 | Re: Parallel Inserts in CREATE TABLE AS |