From: | Simon Riggs <simon(at)2ndquadrant(dot)com> |
---|---|
To: | Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com> |
Cc: | Paul Guo <guopa(at)vmware(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Multi Inserts in CREATE TABLE AS - revived patch |
Date: | 2020-12-30 15:04:12 |
Message-ID: | CANP8+jKLv0Et4nCmEvrSqxgji9hgat9kQ6ndH3c7JE0uLPd2gw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Mon, 16 Nov 2020 at 15:32, Bharath Rupireddy
<bharath(dot)rupireddyforpostgres(at)gmail(dot)com> wrote:
>
> On Mon, Nov 16, 2020 at 8:02 PM Paul Guo <guopa(at)vmware(dot)com> wrote:
> >
> > > On Nov 13, 2020, at 7:21 PM, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com> wrote:
> > >
> > > On Tue, Nov 10, 2020 at 3:47 PM Paul Guo <guopa(at)vmware(dot)com> wrote:
> > >>
> > >> Thanks for doing this. There might be another solution - use raw insert interfaces (i.e. raw_heap_insert()).
> > >> Attached is the test (not formal) patch that verifies this idea. raw_heap_insert() writes the page into the
> > >> table files directly and also write the FPI xlog when the tuples filled up the whole page. This seems be
> > >> more efficient.
> > >>
> > >
> > > Thanks. Will the new raw_heap_insert() APIs scale well (i.e. extend
> > > the table parallelly) with parallelism? The existing
> > > table_multi_insert() API scales well, see, for instance, the benefit
> > > with parallel copy[1] and parallel multi inserts in CTAS[2].
> >
> > Yes definitely some work needs to be done to make raw heap insert interfaces fit the parallel work, but
> > it seems that there is no hard blocking issues for this?
> >
>
> I may be wrong here. If we were to allow raw heap insert APIs to
> handle parallelism, shouldn't we need some sort of shared memory to
> allow coordination among workers? If we do so, at the end, aren't
> these raw insert APIs equivalent to current table_multi_insert() API
> which uses a separate shared ring buffer(bulk insert state) for
> insertions?
>
> And can we think of these raw insert APIs similar to the behaviour of
> table_multi_insert() API for unlogged tables?
I found the additional performance of Paul Guo's work to be compelling
and the idea workable for very large loads.
Surely LockRelationForExtension() is all the inter-process
coordination we need to make this work for parallel loads?
--
Simon Riggs http://www.EnterpriseDB.com/
From | Date | Subject | |
---|---|---|---|
Next Message | Stephen Frost | 2020-12-30 15:26:28 | Re: [PATCH] Simplify permission checking logic in user.c |
Previous Message | Bharath Rupireddy | 2020-12-30 14:59:11 | Re: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit |