From: | Mike Christensen <mike(at)kitchenpc(dot)com> |
---|---|
To: | Craig Ringer <ringerc(at)ringerc(dot)id(dot)au> |
Cc: | Ben Chobot <bench(at)silentmedia(dot)com>, pgsql-general(at)postgresql(dot)org |
Subject: | Re: Materialized views in Oracle |
Date: | 2011-09-23 07:03:20 |
Message-ID: | CABs1bs1dVkU4SDyiSRhyOA95KjxorvaqA4903sW3tQq9X=e5UQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
On Wed, Sep 21, 2011 at 11:34 PM, Craig Ringer <ringerc(at)ringerc(dot)id(dot)au> wrote:
[snip]
> This can get complicated when you have triggers acting recursively on a
> table and it isn't always that easy to understand exactly what a trigger
> will see.
I do agree with most all your points. The value I got out of this
experience was the perspective I gained dealing with materialized
views and being forced to think about the logic behind their
implementation. As you said, this feature is friggen tough to do
right. I read up a bit about Microsoft SQL’s “indexed views” and they
too have a long list of things you can’t do, and from what I can tell,
they only support the idea of keeping them up to date on every commit.
So, since we’re not on this list to discuss how we can improve Oracle,
I’d like to not make that the focus of my response. I’d rather have a
discussion on what materialized views could mean in the Postgres
world. I think my expectations, now slightly more validated through
my realization of what’s possible with Oracle, would go something like
this:
1) Though I might have given the impression that a “manual complete
refresh” is not useful, I definitely see value in this especially for
data warehousing scenarios. However, I’d almost want to call this a
“snapshot” and not a “materialized view” – the two things are so
different, I think it warrants different syntax. I think the ability
to create a snapshot in time would be quite useful:
CREATE SNAPSHOT Foo AS
SELECT * FROM Bar;
Now, I have “Foo” as a record in time, and can refer to it as I could
any other table. I’d also like the ability to refresh it (via a
schedule or a trigger):
REFRESH SNAPSHOT Foo;
Snapshots would remember their underlying query and could thus easily
update at any time.
2) The huge feature here (which MS SQL implements as indexed views)
are views that automatically update as the data underneath them
changes. I’ve come to the conclusion that, while it’s impressive that
Oracle can do anything close to this and have it actually work pretty
well in a lot of scenarios, Oracle simply can’t have an
all-encompassing knowledge of exactly how my database works and where
all the data comes from. You know who does have that knowledge? Me.
I believe I should have the power to instruct Postgres exactly when
and how to update my view in situations where it cannot be
automatically ascertained through the engine, rather than the DB
saying “Sorry I can’t be perfect thus you can’t do that.” For me, I
see this “chore” as a lot more appealing than trying to figure out why
I can’t make the view that I want.
I expect to be able to create two kinds of materialized views: Ones
that COMPLETE refresh any time any referred column changes, and ones
that do a FAST refresh. If I specify FAST but the engine can’t infer
what it needs, I should get a NOTICE and it should fall back to a
COMPLETE. If I specify nothing, it should create FAST if it can, and
if not fall back to COMPLETE without notice.
When creating a materialized view, I believe warnings should be issued
when a column’s source cannot be inferred by the parser, but at the
risk of being controversial, I think I should still be able to create
the view anyway. I’ve always been partial to systems that allow you
to shoot yourself in the foot. I could see doing something like:
CREATE MATERIALIZED VIEW Foo AS
SELECT ID, Name FROM Users;
Postgres knows that ID is a primary key, and can thus update the view
when Users changes. Had I not put in a primary key, I think one
should be generated for me automatically based on every primary key in
the referred tables. If tables do not have primary keys, you’d get a
warning that the view has to be re-created on any change to the
referred keyless tables.
CREATE MATERIALIZED VIEW Foo AS
SELECT State, AVG(Age) FROM Users GROUP BY State;
Ok this is a tough one to figure out. Since the Age column is
referred to, we could force a refresh every time Age in any row
changes. In theory, the engine could be smart enough to realize the
aggregate age came from a group, and thus update the appropriate
“State” row when any Age within that state changed. Wow, this is
getting tough; I see why Oracle just said no way on aggregate
functions.
CREATE MATERIALIZED VIEW Foo AS
SELECT ID, GetStateFromZip(ZipCode) as State FROM Users;
Ouch, a function. Well, the engine could look at the volatility of my
GetStateFromZip function and make an assumption that passing in value
X would always result in Y for nonvolatile functions. Then, update
row ID when ZipCode changes. However, if the function is volatile or
the data that the function itself uses changes, then we’d definitely
run into issues. Two options: One, you’d issue a NOTICE and say
something like “The source of column ‘State’ cannot be inferred.” and
create the view anyway. Maybe there could be some STRICT option or
pragma to simply not allow creating the dangerous views that could get
stale. Another option, the user could provide the insight the engine
needs to create the view.
CREATE MATERIALIZED VIEW Foo
REFRESH ON (ZipCodes Z WHERE Z.Zip = Users.Zip) AS
SELECT ID, GetStateFromZip (ZipCode) as State FROM Users;
Now we’ve told the engine that when any column in the ZipCodes table
changes (you could probably qualify this further), then to update the
materialized view of any Users row with that zipcode. Ok, this isn’t
100% ironed out but the idea is that I can provide instructions on how
to handle situations that cannot be inferred automatically.
I really like the idea of having the database do the best it can with
the data it has, and issue NOTICEs if your view won’t be perfect.
Then, allowing me to provide hints as to when and how to update the
pieces of data that cannot be automatically inferred if I want to get
around those warnings. To me, it seems more polite than insisting the
database knows all and when it doesn’t, denying the very existence of
the view completely.
One could argue that if you’re going this far, you’ve basically turned
the whole trigger mechanism inside out and one could already implement
this whole thing on Postgres 9 using real tables and triggers when the
appropriate data changes. This is something I’m struggling with as
well. Materialized views seem to be great for quick database
snapshots and very simple selects, but anything too complicated and
it’s either not an option, or you’d have to explain so much logic that
you’re better off using a series of triggers and writing a custom
solution anyway.
This makes me really question the fundamental use of materialized
views. In other words, what is a real solid scenario that you would
use one for that simply cannot be done currently using triggers? Or,
is it simply the “ease of creation” people are after? There must be
things I’m just not seeing, as this is the number one feature request
on postgresql.uservoice.com by a massive landslide, and I fail to
believe all those people are just “lazy.”
From | Date | Subject | |
---|---|---|---|
Next Message | hamann.w | 2011-09-23 07:45:51 | Re: looking for a faster way to do that |
Previous Message | Craig Ringer | 2011-09-23 06:26:51 | Re: 10GbE / iSCSI storage for postgresql. |