From: | Ants Aasma <ants(at)cybertec(dot)at> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Simon Riggs <simon(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tomas Vondra <tv(at)fuzzy(dot)cz>, Thom Brown <thom(at)linux(dot)com>, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: change in LOCK behavior |
Date: | 2012-10-16 21:37:47 |
Message-ID: | CA+CSw_tFbZ71jWDJGXtvJSrHPZ2y16yb6QDvCTRy4eRr=NsYiA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Thu, Oct 11, 2012 at 7:53 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Maybe what we really need is to find a way to make taking a snapshot a
> lot cheaper, such that the whole need for this patch goes away. We're
> not going to get far with the idea of making SnapshotNow MVCC-safe
> unless it becomes a lot cheaper to get an MVCC snapshot. I recall some
> discussion of trying to reduce a snapshot to a WAL offset --- did that
> idea crash and burn, or is it still viable?
This was mostly covered in the cheaper snapshots thread. [1] Robert
decided to abandon the idea after concluding that the memory overhead
was untenable with very old snapshots. [2] I had a really hand-wavy
idea of lazily converting snapshots from sequence number based
snapshots to traditional list of xids snapshots to limit the overhead.
That idea was promptly shot down because in that incarnation it needed
snapshots to be stored in shared memory. [3] I have done some more
thinking on this topic, although I have to admit that it has been on
the backburner. It seems to me that the problems are all surmountable.
To recap shortly, the idea is to define visibility and snapshots
through commit sequence numbers (LSNs have problems due to async
commit). The tricky part is the datastructure to support fast
xid-to-csn lookup for visibility checks. To support visibility checks
enough information needs to be kept so that the oldest CSN based
snapshot can resolve its xmin-xmax range to csns. My idea currently is
to have two fixed size shared memory buffers and an overflow log. The
first ring buffer is a dense array mapping of xids to csns. The
overflow entries from the dense ring buffer are checked if they might
be invisible to any CSN based snapshots, and if so inserted into the
sparse buffer. The sparse buffer is a sorted array containing xid-csn
pairs that are still running or are concurrent with an active CSN
based snapshot. Once the sparse buffer is filled up, the smallest
xid-csn pairs are evicted to the CSN log. The long running CSN based
snapshots then need to read this log to build up the
SnapshotData->xip/subxip arrays. The backends can either discover that
their snapshots CSNs values have overflowed by checking the
appropriate horizon value, or be signaled via an interrupt to enable
CSN log cleanup ASAP.
I still have to work out some details on how to handle subtransaction
overflow, how to maintain reasonably fresh values for different
horizons and what are necessary ordering barriers to get lock-free
visibility checks. The idea currently seems workable and will make
taking snapshots really cheap, while the worst case maintenance
overhead is mostly shifted to sessions that acquire lots of writing
transactions and hold snapshots open for a long time.
If anyone is interested I can do a slightly longer write up detailing
what I have worked out so far.
Ants Aasma
[1] http://archives.postgresql.org/message-id/CA%2BTgmoaAjiq%3Dd%3DkYt3qNj%2BUvi%2BMB-aRovCwr75Ca9egx-Ks9Ag%40mail.gmail.com
[2] http://archives.postgresql.org/message-id/CA%2BTgmoYD6EhYy1Rb%2BSEuns5smreY1_3rAMeL%3D76rX8deijy56Q%40mail.gmail.com
[3] http://archives.postgresql.org/message-id/CA%2BCSw_uDfg2SBMicGNu13bpr2upbnVL_edoTbzvacR1FrNrZ1g%40mail.gmail.com
--
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de
From | Date | Subject | |
---|---|---|---|
Next Message | Josh Berkus | 2012-10-16 22:03:53 | Re: [RFC] CREATE QUEUE (log-only table) for londiste/pgQ ccompatibility |
Previous Message | Peter Geoghegan | 2012-10-16 21:34:41 | Re: tuplesort memory usage: grow_memtuples |