Re: [PATCH] Fix minor race in commit_ts SLRU truncation vs lookups

From: Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: Craig Ringer <craig(at)2ndquadrant(dot)com>, Petr Jelinek <petr(dot)jelinek(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCH] Fix minor race in commit_ts SLRU truncation vs lookups
Date: 2017-01-19 21:32:27
Message-ID: 441fce2a-9eac-3fe8-9216-9ddd8239cb28@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 1/19/17 10:06 AM, Alvaro Herrera wrote:
>> Also, I wonder whether we should not in vacuum.c change the order of the
>> calls of SetTransactionIdLimit() and SetMultiXactIdLimit() as well, just
>> to keep everything consistent.
>
> I am wary of doing that. The current coding is well battle-tested by
> now, but doing things in the opposite order, not at all. Pending some
> testing to show that there is no problem with a change, I would leave
> things as they are.

I appreciate this hesitation.

The issue the patch under discussion is addressing is that we have a
user-space interface to read information about commit timestamps. So if
we truncate away old information before we update the mark where the
good information starts, then we get the race. We don't have a
user-space interface reading, say, the clog, but it doesn't seem
implausible for that to exist at some point. How would this code have
to be structured then?

> What I fear is: what
> happens if a concurrent checkpoint reads the values between the two
> operations, and a crash occurs? I think that the checkpoint might save
> the updated values, so after crash recovery the truncate would not be
> executed, possibly leaving files around. Leaving files around might be
> dangerous for multixacts at least (it's probably harmless for xids).

Why is leaving files around dangerous? If this is a problem, why is it
not a problem for commit timestamps? I don't understand why these
different SLRU uses are different.

Yeah, we could go ahead with this patch as is and it might be fine, but
I feel like we don't understand the broader issue completely yet.

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2017-01-19 21:51:02 Re: Improving RLS planning
Previous Message Jorge Solórzano 2017-01-19 21:31:57 Re: [HACKERS] SEGFAULT in HEAD with replication