From: | Robert Haas <robertmhaas(at)gmail(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: lazy snapshots? |
Date: | 2010-10-21 03:11:22 |
Message-ID: | AANLkTinKvBLmp0yzeZh6JhWg-cXf5+cajKcLsiXnYxMh@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wed, Oct 20, 2010 at 10:07 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> It's necessary to convince ourselves not only that this has some
>> performance benefit but that it's actually correct. It's easy to see
>> that, if we never take a snapshot, all the tuple visibility decisions
>> we make will be exactly identical to the ones that we would have made
>> with a snapshot; the choice of snapshot in that case is arbitrary.
>> But if we do eventually take a snapshot, we'll likely make different
>> tuple visibility decisions than we would have made had we taken the
>> snapshot earlier. However, the decisions that we make prior to taking
>> the snapshot will be consistent with the snapshot, and we will
>> certainly see the effects of all transactions that committed before we
>> started. We may also see the effects of some transactions that commit
>> after we started, but that is OK: it is just as if our whole
>> transaction had been started slightly later and then executed more
>> quickly thereafter.
>
> I don't think this is going to be acceptable at all. You're assuming
> that clients have no independent means of determining what order
> transactions execute in, which isn't the case. It would be quite
> possible, for example, for a query submitted to one backend to see the
> effects of a transaction that was submitted to another backend long
> after the first query started. If the two clients involved interact
> at all, they're not going to be happy. Even if they just compare
> transaction timestamps, they're not going to be happy.
I'm not sure they're entitled to rely on any other behavior. Couldn't
the exact same thing happen in a non-MVCC database based on SS2PL?
> I'm less than convinced by the hypothesis that most transactions would
> avoid taking snapshots in this regime, anyway. It would only hold up
> if there's little locality of reference in terms of which tuples are
> getting examined/modified by concurrent transactions, and that's a
> theory that seems much more likely to be wrong than right.
There will certainly be workloads where most transactions acquire a
snapshot, but just to take one obvious example, suppose we have a data
warehouse where every night we bulk load the day's data, and then we
run reporting queries all day. Except during the overnight bulk
loads, there's no concurrent write activity at all, and thus no need
for snapshots. Or imagine a database where we store monitoring data.
There's a continuous flow of monitoring data from multiple sources;
and then people run reports. The users running reports will need
snapshots, but the processes updating the monitoring data will
presumably be touching discrete sets of tuples. They may be
INSERT-only, and even if they do updates, the process monitoring
resource A only needs to look at the rows for resource A, not the rows
for resource B. If the tables are large enough that index scans are
used and the threshold XID is updated sufficiently frequently, you
might get away without snapshots. This isn't quite so clear a win as
the first one but maybe it's worth thinking about.
One thing we could do is instrument the current code to track whether
any field of the snapshot other than snapshot->xmin is ever used, and
then run some benchmarks to see how often that happens.
> I wonder whether we could do something involving WAL properties --- the
> current tuple visibility logic was designed before WAL existed,
Wow.
> so it's
> not exploiting that resource at all. I'm imagining that the kernel of a
> snapshot is just a WAL position, ie the end of WAL as of the time you
> take the snapshot (easy to get in O(1) time). Visibility tests then
> reduce to "did this transaction commit with a WAL record located before
> the specified position?". You'd need some index datastructure that made
> it reasonably cheap to find out the commit locations of recently
> committed transactions, where "recent" means "back to recentGlobalXmin".
> That seems possibly do-able, though I don't have a concrete design in
> mind.
Interesting. O(1) snapshots would be great. I need to think about
this more before commenting on it, though.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2010-10-21 03:24:05 | Re: lazy snapshots? |
Previous Message | Itagaki Takahiro | 2010-10-21 03:01:59 | Re: Extensions, this time with a patch |