Quick Links

Re: Prepared statements fail after schema changes with surprising error

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Peter van Hardenberg <pvh(at)pvh(dot)ca>, Peter Geoghegan <peter(dot)geoghegan86(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org Hackers" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Prepared statements fail after schema changes with surprising error
Date:	2013-01-23 15:13:00
Message-ID:	CA+TgmobCH=Rk5LeExoZGskKYANGmBjZOp4CGrWZodF284Jdw6A@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Tue, Jan 22, 2013 at 12:44 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> After reflecting on this a bit, I think that the problem may come from
> drawing an unjustified analogy between views and prepared statements.
> The code is certainly trying to treat them as the same thing, but
> perhaps we shouldn't do that.
>
> Consider that once you do
> create view v as select * from s.t;
> the view will continue to refer to the same table object no matter what.
> You can rename t, you can rename s, you can move t to a different schema
> and then drop s, but the view still knows what t is, because the
> reference is by OID. The one thing you can't do is drop t, because the
> stored dependency from v to t will prevent that (at least unless you let
> it cascade to drop v as well). Views therefore do not have, or need,
> any explicit dependencies on either specific schemas or their
> creation-time search_path --- they only have dependencies on individual
> objects.
>
> The current plancache code is trying, in a somewhat half-baked fashion,
> to preserve those semantics for prepared queries --- that's partly
> because it's reusing the dependency mechanism that was designed for
> views, and partly because it didn't occur to us to question that model.
> But it now strikes me that the model doesn't apply very well, so maybe
> we need a new one. The key point that seems to force a different
> treatment is that there are no stored (globally-visible) dependencies
> for prepared queries, so there's no way to guarantee that referenced
> objects don't get dropped.
>
> We could possibly set things up so that re-executing a prepared query
> that references now-dropped objects would throw an error; but what
> people seem to prefer is that it should be re-analyzed to see if the
> original source text would now refer to a different object. And we're
> doing that, but we haven't followed through on the logical implications.
> The implication, ISTM, is that we should no longer consider that
> referring to the same objects throughout the query's lifespan is a goal
> at all. Rather, what we should be trying to do is make the query
> preparation transparent, in the sense that you should get the same
> results as if you resubmitted the original query text each time.
>
> In particular, it now seems to me that this makes a good argument
> for changing what plancache is doing with search_path. Instead of
> re-establishing the original search_path in a rather vain hope that the
> same objects will be re-selected by parse analysis, we should consider
> that the prepared query has a dependency on the active search path, and
> thus force a replan if the effective search path changes.

I think that's exactly right, and as Stephen says, likely to be a very
significant improvement over the status quo even if it's not perfect.

(Basically, I agree with everything Stephen said in his followup
emails down to the letter. +1 from me for everything he said.)

> I'm not sure that we can make the plan caching 100% transparent, though.
> The existing mechanisms will force replan if any object used in the plan
> is modified (and fortunately, "modified" includes "renamed", even though
> a rename isn't interesting according to the view-centric worldview).
> And we can force replan if the search path changes (ie, the effective
> list of schema OIDs changes). But there are cases where neither of
> those things happens and yet the user might expect a new object to be
> selected. Consider for example that the search path is a, b, c,
> and we have a prepared query "select * from t", and that currently
> refers to b.t. If now someone creates a.t, or renames a.x to a.t,
> then a replan would cause the query to select from a.t ... but there
> was no invalidation event that will impinge on the stored plan, and the
> search_path setting didn't change either. I don't think we want to
> accept the overhead of saying "any DDL anywhere invalidates all cached
> plans", so I don't see any good way to make this case transparent.
> How much do we care?

I'd just like to mention that Noah and I left this same case unhandled
when we did all of those concurrent DDL improvements for 9.2. A big
part of that worked aimed at fixing tthe problem of a DML or DDL
statement latching onto an object that had been concurrently dropped,
as in the case where someone does BEGIN; DROP old; RENAME new TO old;
COMMIT; for any sort of SQL object (table, function, etc.). That code
is all now much more watertight than it was before, but the case of
creating an object earlier in the search path than an existing object
of the same name is still not guaranteed to work correctly - there's
no relevant invalidation message, so with the right timing of events,
you can still manage to latch onto the object that appears later in
the search path instead of the new one added to a schema that appears
earlier. So there is precedent for punting that
particularly-difficult aspect of problems in this category.

To make the cached-plan stuff truly safe against this case, you'd have
to replan whenever an object is created in a schema which appears
earlier in the search path than some object referenced by the query -
except for functions, where you also need to worry about a better
candidate showing up anywhere in the search path, before or after the
schema where the currently-used object appears. That's a lot of
replanning, but maybe not intolerable. For example, consider
search_path = a, b. If somebody creates an object in b, we don't need
to replan, unless it's a function. But creation of practically
anything in a is enough to force a replan. I'm not sure whether we
can optimize this that tightly, but if we can it could probably
eliminate most of the pain here, because in most cases people are
going to have a search path like $user, public, and most of the object
creation and deletion is going to happen in public, which doesn't pose
the hazard we're concerned about (again, except for object types that
allow overloading).

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Re: Prepared statements fail after schema changes with surprising error at 2013-01-22 17:44:19 from Tom Lane

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Robert Haas	2013-01-23 15:14:51	Re: Prepared statements fail after schema changes with surprising error
Previous Message	Alvaro Herrera	2013-01-23 15:12:00	pgsql: Improve concurrency of foreign key locking