From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | pgsql-hackers(at)postgreSQL(dot)org |
Subject: | Plan invalidation design |
Date: | 2007-02-17 17:48:33 |
Message-ID: | 20244.1171734513@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
I'm starting to think about the long-wanted plan invalidation mechanism.
Here's a sketch --- anyone see any problems?
* Create a new module, say src/backend/utils/cache/plancache.c, that we
will put in charge of all long-lived plans --- or at least those cached by
PREPARE, plpgsql, and RI triggers. I'm unsure whether we should make all
SPI plans work this way or not; it's possible that doing so would change
SPI's API behavior enough to break user-written code. Any thoughts on
that?
* plancache.c will have two basic functions:
1. Given a query's raw parse tree (that is, the raw output of gram.y),
analyze and plan the query. Store both the parse tree and plan in a
backend-local cache table, and return a handle for the table entry as well
as the plan tree.
2. Given a handle for a previously stored query, check to see if the plan
is still up to date; if not, regenerate it from the raw parse tree (note
this could result in failure, eg if a column used by the query has been
dropped). Then return the plan tree.
We probably want to return a direct pointer to the cached plan tree
instead of making a copy. This should be safe, because the executor now
treats plan trees as read-only, but it does mean that when plan
invalidation occurs the cached plan tree might still be in use. We'll
probably need to have a notion of a reference count: so the two functions
above would increment the plan's refcount and there would be a third
"ReleasePlanCache" function to call when done using a plan (and, hence,
these references would need to be supported by the ResourceManager
mechanism).
Note that the source object for caching is a raw parse tree. This should
work since we already require that gram.y not look into the database
during its processing; therefore, the raw tree need never be invalidated.
It'd be conceptually simpler if we passed in a query string instead, but
I don't think that works for PREPARE, because it might be embedded in a
multi-command string. (We do probably want to pass in the original query
string too, if available, because it's needed for syntax error reporting.)
nodes/copyfuncs.c will need some expansion, as I don't believe it has
coverage for all raw-parse-tree node types.
Invalidation will be detected by having plancache.c watch for relcache
invalidation events, using the existing inval.c callback mechanism.
On any relcache inval, traverse the plan cache looking for plans that
mention the invalidated relation in their rangetables, and mark them as
needing to be regenerated before next use. (If they currently have
refcount zero, we could delete the plan part of the cache entry
immediately.)
Relcache inval casts a fairly wide net; for example, adding or dropping an
index will invalidate all plans using the index's table whether or not
they used that particular index, and I believe that VACUUM will also
result in a relcache inval due to updating the table's pg_class row.
I think this is a good thing though --- for instance, after adding an
index it seems a good idea to replan to see if the new index is useful,
and replanning after a VACUUM is useful if the table has changed size
enough to warrant a different plan. OTOH this might mean that plans on a
high-update-traffic table never survive very long because of autovacuum's
efforts. If that proves to be a problem in practice we can look at ways
to dial down the number of replans, but for the moment I think it's more
important to be sure we *can* replan at need than to find ways to avoid
replans.
Note that I'm currently intending to detect only relcache invals, not
changes to functions or operators used in the plan. (Relcache inval will
cover view redefinitions, though.) We could extend it to handle that
later, but it looks like a lot more mechanism and overhead for not a lot
of gain. AFAICS there are only three cases where there'd be a benefit:
* if you redefine an immutable function, any places where its result has
been pre-computed by constant-folding wouldn't get updated without inval.
* if you have a SQL function that's been inlined into a plan, a change
in the function definition wouldn't get reflected into the plan without
inval.
* if you alter a function and change its volatility property, that might
possibly affect the shape of plans that use the function (for instance
some optimization transformation might now be allowed or not).
To my memory none of these problems have been complained of from the
field. Making the cache module able to detect function-related
invalidations would be a bit of work --- for example, if a function has
been inlined, there is no recognizable reference to it at all in the plan
tree, so we'd have to modify the planner to track such things and report
them somehow. (The corresponding problem for views doesn't exist, because
there is still a rangetable entry for a view after it's been expanded.)
So I think this is a "maybe do someday" part, not something to do in the
first release.
One interesting point is that to avoid race conditions, the function that
checks for is-plan-update-required will have to acquire locks on the
tables mentioned in the plan before it can be sure there's not a pending
invalidation event on them. This doesn't seem like a problem, though it
might mean we want to refactor the executor API a bit to avoid duplicate
effort.
Comments?
regards, tom lane
From | Date | Subject | |
---|---|---|---|
Next Message | Pavan Deolasee | 2007-02-17 18:22:24 | Re: wishlist items .. |
Previous Message | Peter Eisentraut | 2007-02-17 17:12:54 | Re: patch adding new regexp functions |