Re: PROC_IN_ANALYZE stillborn 13 years ago

From: Andres Freund <andres(at)anarazel(dot)de>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, James Coleman <jtc331(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject: Re: PROC_IN_ANALYZE stillborn 13 years ago
Date: 2020-08-06 21:45:41
Message-ID: 20200806214541.4lyyzekds65hrlcq@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2020-08-06 16:22:23 -0400, Robert Haas wrote:
> On Thu, Aug 6, 2020 at 3:11 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> > (1) Without a snapshot it's hard to make any non-bogus decisions about
> > which tuples are live and which are dead. Admittedly, with Simon's
> > proposal the final totals would be spongy anyhow, but at least the
> > individual decisions produce meaningful answers.
>
> I don't think I believe this. It's impossible to make *consistent*
> decisions, but it's not difficult to make *non-bogus* decisions.
> HeapTupleSatisfiesVacuum() and HeapTupleSatifiesUpdate() both make
> such decisions, and neither takes a snapshot argument.

Yea, I don't think that's a big problem for the main table. As I just
mentioned in an email a few minutes ago, toast is a bit of a different
topic.

In fact using conceptually like a new snapshot for each sample tuple
actually seems like it'd be somewhat of an improvement over using a
single snapshot. Given that it's a sample it's not like have very
precise expectations of the precise sample, and publishing one that
solely consists of pretty old rows by the time we're done doesn't seem
like it's a meaningful improvement. I guess there's some danger of
distinctness estimates getting worse, by seeing multiple versions of the
same tuple multiple times - but they're notoriously inaccurate already,
don't think this changes much.

> > (2) I'm pretty sure there are places in the system that assume that any
> > reader of a table is using an MVCC snapshot. For instance, didn't you
> > introduce some such assumptions along with or just after getting rid of
> > SnapshotNow for catalog scans?
>
> SnapshotSelf still exists and is still used, and IIRC, it has very
> similar semantics to the old SnapshotNow, so I don't think that we
> introduced any really general assumptions of this sort. I think the
> important part of those changes was that all the code that had
> previously used SnapshotNow to examine system catalog tuples for DDL
> purposes and catcache lookups and so forth started using an MVCC scan,
> which removed one (of many) impediments to concurrent DDL. I think the
> fact that we removed SnapshotNow outright rather than just ceasing to
> use it for that purpose was mostly so that nobody would accidentally
> reintroduce code that used it for the sorts of purposes for which it
> had been used previously, and secondarily for code cleanliness.
> There's nothing wrong with it fundamentally AFAIK.

Some preaching to the choir:

IDK, there's not really much it (along with Self, Any, ...) can safely
be used for, unless you have pretty heavyweight additional locking, or
look explicitly at exactly one tuple version. Except that it's probably
unnecessary, and that there's some disaster recovery benefits, I'd be in
favor of prohibiting most snapshot types for [sys]table scans.

I'm doubtful that using the term "snapshot" for any of these is a good
choice, and I don't think there's benefit in actually going through the
snapshot APIs. Especially not when, like *Dirty, they abuse fields
inside SnapshotData to return data that can't be returned through the
normal API. It'd probably be better to have more explicit APIs for
these, rather than going through snapshot.

> It's not clear to me that it would even be correct to categorize those
> somewhat-different results as "less accurate." Tuples that are
> invisible to a query often have performance consequences very similar
> to visible tuples, in terms of the query run time.

+1

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2020-08-06 22:02:26 Re: PROC_IN_ANALYZE stillborn 13 years ago
Previous Message Tom Lane 2020-08-06 21:35:33 Re: PROC_IN_ANALYZE stillborn 13 years ago