From: | Heikki Linnakangas <hlinnakangas(at)vmware(dot)com> |
---|---|
To: | Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com> |
Cc: | Robert Haas <robertmhaas(at)gmail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Rajeev rastogi <rajeev(dot)rastogi(at)huawei(dot)com>, Ants Aasma <ants(at)cybertec(dot)at>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Markus Wanner <markus(at)bluegap(dot)ch> |
Subject: | Re: Proposal for CSN based snapshots |
Date: | 2014-05-30 14:59:23 |
Message-ID: | 53889CCB.7030005@vmware.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
So, here's a first version of the patch. Still very much WIP.
One thorny issue came up in discussions with other hackers on this in PGCon:
When a transaction is committed asynchronously, it becomes visible to
other backends before the commit WAL record is flushed. With CSN-based
snapshots, the order that transactions become visible is always based on
the LSNs of the WAL records. This is a problem when there is a mix of
synchronous and asynchronous commits:
If transaction A commits synchronously with commit LSN 1, and
transaction B commits asynchronously with commit LSN 2, B cannot become
visible before A. And we cannot acknowledge B as committed to the client
until it's visible to other transactions. That means that B will have to
wait for A's commit record to be flushed to disk, before it can return,
even though it was an asynchronous commit.
I personally think that's annoying, but we can live with it. The most
common usage of synchronous_commit=off is to run a lot of transactions
in that mode, setting it in postgresql.conf. And it wouldn't completely
defeat the purpose of mixing synchronous and asynchronous commits
either: an asynchronous commit still only needs to wait for any
already-logged synchronous commits to be flushed to disk, not the commit
record of the asynchronous transaction itself.
Ants' original design with a separate commit-sequence-number that's
different from the commit LSN would not have this problem, because that
would allow the commits to become visible to others in out-of-WAL-order.
However, the WAL order == commit order is a nice and simple property,
with other advantages.
Some bigger TODO items:
* Logical decoding is broken. I hacked on it enough that it looks
roughly sane and it compiles, but didn't spend more time to debug.
* I expanded pg_clog to 64-bits per XID, but people suggested keeping
pg_clog as is, with two bits per commit, and adding a new SLRU for the
commit LSNs beside it. Probably will need to do something like that to
avoid bloating the clog.
* Add some kind of backend-private caching of clog, to make it faster to
access. The visibility checks are now hitting the clog a lot more
heavily than before, as you need to check the clog even if the hint bits
are set, if the XID falls between xmin and xmax of the snapshot.
* Transactions currently become visible immediately when a WAL record is
inserted, before it's flushed. That's wrong, but shouldn't be difficult
to fix (except for the async commit issue explained above).
- Heikki
Attachment | Content-Type | Size |
---|---|---|
csn-1.patch.gz | application/gzip | 102.5 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Andrew Dunstan | 2014-05-30 15:08:21 | Re: jsonb access operators inefficiency |
Previous Message | Teodor Sigaev | 2014-05-30 14:55:18 | Re: SP-GiST bug. |