Re: Wrong results from function that selects from vier after "created or replace"

From: Bryn Llewellyn <bryn(at)yugabyte(dot)com>
To: Tom Lane PostgreSQL <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Christophe Pettus <xof(at)thebuild(dot)com>, "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>, PostgreSQL mailing lists <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re: Wrong results from function that selects from vier after "created or replace"
Date: 2020-10-13 06:15:40
Message-ID: 8B55AEA5-D1A5-4BCC-B840-E9DB59DB8BB5@yugabyte.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Thanks, again, Tom. I believe that I now have the right understanding of the semantics of “immutable”. And I’m going to hold firm that the semantics definition cannot refer to aspects of the current, or possible future implementation.

1. The word is to be taken in the mathematical sense of “deterministic”. (This, of course, has nothing to do with any meaning that Oracle Database might give to it.) For example, from Wikipedia:

> a deterministic algorithm is an algorithm which, given a particular input, will always produce the same output… Formally, a deterministic algorithm computes a mathematical function; a function has a unique value for any input in its domain, and the algorithm is a process that produces this particular value as output.

So this informs the meaning of “forever”. Another way to see it is that time cannot be in the picture. The algorithm simply is what it is, in some platonic sense of the term. Like sin(x).

2. When a Postgres function is marked “immutable”, the implementation has permission to cache the output value produced for any set of actuals—and to maintain the values in that cache literally for ever (never mind what is practically feasible). However, the cache must be attached to an existing function—and so it vanishes when the function is dropped. As long as the function survives (and survival includes living on over “create or replace”), the implementation has permission to evaluate the function by accessing the cached value for the actual(s) at hand rather than running the function’s body.

3. Nothing in #2 says that the values are guaranteed to be cached, or that a cached value for some set of actuals will in fact be accessed when those values are presented again.

4. Unless you promise that a function that you mark “immutable” meets the criteria set out in #1, you’ll risk getting wrong results.

5. if you do empirical tests that attempt to determine how the implementation does the caching, and uses cached values, then you will learn nothing—and you will risk drawing groundless conclusions. (Just like I did.)

On 12-Oct-2020, at 20:28, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

Bryn Llewellyn <bryn(at)yugabyte(dot)com> writes:
> I see the now that my analysis was faulty. And that, moreover, I described Oracle Database’s meaning of “deterministic” wrongly. Yes indeed, both Postgres’s “immutable” and Oracle’s “deterministic” rule out sensitivity to database state. And “select” from table or a view must be defined to be reading database state—even when I happen to know that the view text specifies a result that does not depend on database state.

FWIW, when I said "database state" I meant to include the contents of the
system catalogs, not only user-defined tables. So redefining the view v
as you did counts as a database state change.

> I’m going you risk overtaxing your patience with a follow-up question to test what you’ve just said. What’s your stance on this?
> ...
> Should I understand that this outcome, while it might have wrongly seemed to me to support my bad mental model, actually tells me nothing? In other words, should I understand that the _definition_ of “immutable” would allow some future PG implementation justifiably still to return “42” after recompiling “my_constant()”.

Yeah. Whether you get "42" or "17" in this example is an implementation
artifact that could vary (and has varied, I think) across PG versions,
depending on the exact plan caching behavior being used. It's even
possible that the result would change in an asynchronous way within a
single session, since hard-to-predict cache flush events could cause the
plan in the calling query to be rebuilt.

> And should I understand that I might to continue to see “42” quite literally forever (even after pg-stop and pg-start) —until I actually drop the function “my_constant()”.

In our current implementation, you would not see the effects of the old
function contents persisting into new backend processes, *in an example of
this sort*. The reason why we insist that "immutable" means "no changes
for the foreseeable future" is that immutable functions are allowed in
index definitions, and there is no mechanism for rebuilding an index
after a behavioral change of a function that it depends on. As a simple
example, if you define f(x) = x+1 and then do

create index on mytab (f(mycol));

then a query such as "select * from mytab where f(mycol) = 42" will
return rows where mycol=41. If you then change f() so it returns x+2,
that query will still return rows where mycol=41, because the
corresponding index entries still contain 42. You'd need to manually
reindex in order to bring the index into sync with the new function
definition. From our standpoint, relaxing the definition of immutable
would entail that the database undertakes to make that sort of thing
happen transparently. While it's probably not impossible, it's not
something we care to invest the required amount of effort in.

regards, tom lane

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message Pavel Stehule 2020-10-13 07:10:27 Re: BUG #16665: Segmentation fault
Previous Message Thomas Munro 2020-10-13 04:37:23 Re: BUG #16665: Segmentation fault