From: | Justin Pryzby <pryzby(at)telsasoft(dot)com> |
---|---|
To: | Luc Vlaming <luc(at)swarm64(dot)com> |
Cc: | Simon Riggs <simon(at)2ndquadrant(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers(at)lists(dot)postgresql(dot)org, Heikki Linnakangas <hlinnaka(at)iki(dot)fi> |
Subject: | Re: should INSERT SELECT use a BulkInsertState? |
Date: | 2020-11-24 02:00:20 |
Message-ID: | 20201124020020.GK24052@telsasoft.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Mon, Nov 02, 2020 at 12:45:51PM -0600, Justin Pryzby wrote:
> On Mon, Nov 02, 2020 at 07:53:45AM +0100, Luc Vlaming wrote:
> > On 30.10.20 05:51, Justin Pryzby wrote:
> > > On Thu, Oct 22, 2020 at 01:29:53PM +0100, Simon Riggs wrote:
> > > > On Fri, 16 Oct 2020 at 22:05, Justin Pryzby <pryzby(at)telsasoft(dot)com> wrote:
> > > >
> > > > > > > I made this conditional on BEGIN BULK/SET bulk, so I'll solicit comments on that.
> > > >
> > > > I think it would be better if this was self-tuning. So that we don't
> > > > allocate a bulkinsert state until we've done say 100 (?) rows
> > > > inserted.
> > >
> > > I made it an optional, non-default behavior in response to the legitimate
> > > concern for performance regression for the cases where a loader needs to be as
> > > fast as possible - as compared with our case, where we want instead to optimize
> > > for our reports by making the loaders responsible for their own writes, rather
> > > than leaving behind many dirty pages, and clobbering the cache, too.
> > >
> > > Also, INSERT SELECT doesn't immediately help us (telsasoft), since we use
> > > INSERT .. VALUES () .. ON CONFLICT. This would handle that case, which is
> > > great, even though that wasn't a design goal. It could also be an integer GUC
> > > to allow configuring the size of the ring buffer.
> > >
> > > > You should also use table_multi_insert() since that will give further
> > > > performance gains by reducing block access overheads. Switching from
> > > > single row to multi-row should also only happen once we've loaded a
> > > > few rows, so we don't introduce overahads for smaller SQL statements.
> > >
> > > Good idea...multi_insert (which reduces the overhead of individual inserts) is
> > > mostly independent from BulkInsert state (which uses a ring-buffer to avoid
> > > dirtying the cache). I made this 0002.
> > >
> > > This makes INSERT SELECT several times faster, and not clobber the cache too.
- Rebased on Heikki's copy.c split;
- Rename structures without "Copy" prefix;
- Move MultiInsert* from copyfrom.c to (tentatively) nodeModifyTable.h;
- Move cur_lineno and transition_capture into MultiInsertInfo;
This switches to multi insert after a configurable number of tuples.
If set to -1, that provides the historic behavior that bulk inserts
can leave behind many dirty buffers. Perhaps that should be the default.
I guess this shouldn't be in copy.h or in commands/* at all.
It'll be included by both: commands/copyfrom_internal.h and
executor/nodeModifyTable.h. Maybe it should go in util or lib...
I don't know how to do it without including executor.h, which seems
to be undesirable.
--
Justin
Attachment | Content-Type | Size |
---|---|---|
v6-0001-Allow-INSERT-SELECT-to-use-a-BulkInsertState.patch | text/x-diff | 9.0 KB |
v6-0002-Make-INSERT-SELECT-use-multi_insert.patch | text/x-diff | 41.0 KB |
v6-0003-Dynamically-switch-to-multi-insert-mode.patch | text/x-diff | 7.2 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Alvaro Herrera | 2020-11-24 02:03:32 | Re: "as quickly as possible" (was: remove spurious CREATE INDEX CONCURRENTLY wait) |
Previous Message | Michael Paquier | 2020-11-24 01:57:40 | Re: abstract Unix-domain sockets |