From: | Andres Freund <andres(at)2ndquadrant(dot)com> |
---|---|
To: | Peter Geoghegan <pg(at)heroku(dot)com> |
Cc: | Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com> |
Subject: | Re: INSERT ... ON CONFLICT UPDATE and logical decoding |
Date: | 2015-03-03 11:13:42 |
Message-ID: | 20150303111342.GF2579@alap3.anarazel.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 2015-02-25 14:04:55 -0800, Peter Geoghegan wrote:
> On Wed, Feb 25, 2015 at 3:26 AM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
> > I'm pretty sure this will entirely fail if you have a transaction that's
> > large enough to spill to disk. Calling ReorderBufferIterTXNNext() will
> > reuse the memory for the in memory changes.
> >
> > It's also borked because it skips a large number of checks for the next
> > change. You're entirely disregarding what the routine does otherwise -
> > it could be a toast record, it could be a change to a system table, it
> > could be a new snapshot...
> >
> > Also, this seems awfully fragile in th presence of toast rows and such?
> > Can we be entirely sure that the next WAL record logged by that session
> > would be the internal super deletion?
>
> toast_save_datum() is called with a heap_insert() call before heap
> insertion for the tuple proper. We're relying on the assumption that
> if there is no immediate super deletion record, things are fine. We
> cannot speculatively insert into catalogs, BTW.
I fail to see what prohibits e.g. another catalog modifying transaction
to commit inbetween and distribute a new snapshot.
> Why should we see a REORDER_BUFFER_CHANGE_INTERNAL_COMMAND_ID case
> next, in the case that we need to care about (when the tuple was super
> deleted immediately afterwards)?
It's irrelevant whether you care about it or not. Your
ReorderBufferIterTXNNext() consumes a change that needs to actually be
processed. It could very well be something important like a new
snapshot.
> As for the idea of writing WAL separately, I don't think it's worth
> it. We'd need to go exclusive lock the buffer again, which seems like
> a non-starter.
Meh^2. That won't be the most significant cost of an UPSERT.
> While what I have here *is* very ugly, I see no better alternative. By
> doing what you suggest, we'd be finding a special case way of safely
> violating the general "WAL log-before-data" rule.
No, it won't. We don't WAL log modifications that don't need to persist
in a bunch of places. It'd be perfectly fine to start upsert with a
'acquiring a insertion lock' record that pretty much only contains the
item pointer (and a FPI if necessary to prevent torn pages). And then
only log the full new record once it's sure that the whole thing needs
to survive. Redo than can simply clean out the size touched by the
insertion lock.
> Performance aside, that seems like a worse, less isolated kludge...no?
No. I'm not sure it'll come up significantly better, but I don't see it
coming up much worse. WAL logging where the result of the WAL logged
action essentially can't be determined until many records later aren't
pretty.
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
From | Date | Subject | |
---|---|---|---|
Next Message | Kouhei Kaigai | 2015-03-03 11:20:43 | Re: Custom/Foreign-Join-APIs (Re: [v9.5] Custom Plan API) |
Previous Message | Fujii Masao | 2015-03-03 10:48:23 | Re: File based Incremental backup v8 |