From: | Heikki Linnakangas <hlinnakangas(at)vmware(dot)com> |
---|---|
To: | Robert Haas <robertmhaas(at)gmail(dot)com> |
Cc: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: advance local xmin more aggressively |
Date: | 2014-12-10 17:57:04 |
Message-ID: | 54888970.5020807@vmware.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 12/10/2014 06:56 PM, Robert Haas wrote:
> On Wed, Dec 10, 2014 at 9:49 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> I guess this bears some further thought. I certainly don't like the
>> fact that this makes the whole system crap out at a lower number of
>> subtransactions than presently. The actual performance numbers don't
>> bother me very much; I'm comfortable with the possibility that closing
>> a cursor will be some modest percentage slower if you've got thousands
>> of active savepoints.
>
> Here's a new version with two changes:
>
> 1. I changed the traversal of the resource owner tree to iterate
> instead of recurse. It now does a depth-first, pre-order traversal of
> the tree; when we reach the last child of a node, we follow its parent
> pointer to get back to where we were. That way, we don't need to keep
> anything on the stack. That fixed the crash at 100k cursors, but it
> was still 4x slower.
Clever. Could we use that method in ResourceOwnerReleaseInternal and
ResourceOwnerDelete, too? Might be best to have a
ResourceOwnerWalk(resowner, callback) function for walking all resource
owners in a tree, instead of one for walking the snapshots in them.
> 2. Instead of traversing the tree until we find an xmin equal to the
> one we're currently advertising, the code now traverses the entire
> tree each time it runs. However, it also keeps a record of how many
> times the oldest xmin occurred in the tree, which is decremented each
> time we unregister a snapshot with that xmin; the traversal doesn't
> run again until that count reaches 0. That fixed the performance
> regression on your test case.
>
> With a million subtransactions:
>
> master 34.464s 33.742s 34.317s
> advance-xmin 34.516s 34.069s 34.196s
Well, you can still get the slowness back by running other stuff in the
background. I admit that it's a very obscure case, probably fine in
practice. I would still feel better if snapmgr.c did its own
bookkeeping, from an aesthetic point of view. In a heap, or even just a
linked list, if the performance characteristics of that is acceptable.
It occurs to me that the pairing heap I just posted in another thread
(http://www.postgresql.org/message-id/54886BB8.9040000@vmware.com) would
be a good fit for this. It's extremely cheap to insert to and to find
the minimum item (O(1)), and the delete operation is O(log N),
amortized. I didn't implement a delete operation, for removing a
particular node, I only did delete-min, but it's basically the same
code. Using a pairing heap for this might be overkill, but if we have
that implementation anyway, the code in snapmgr.c to use it would be
very simple, so I see little reason not to. It might even be simpler
than your patch, because you wouldn't need to have the heuristics on
whether to attempt updating the xmin; it would be cheap enough to always
try it.
- Heikki
From | Date | Subject | |
---|---|---|---|
Next Message | Robert Haas | 2014-12-10 18:13:36 | Re: On partitioning |
Previous Message | Andres Freund | 2014-12-10 17:10:51 | Re: logical column ordering |