From: | denty <denty(at)QQdd(dot)eu> |
---|---|
To: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Implementing Incremental View Maintenance |
Date: | 2018-12-31 10:41:15 |
Message-ID: | 1546252875009-0.post@n3.nabble.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi Yugo.
> I would like to implement Incremental View Maintenance (IVM) on
> PostgreSQL.
Great. :-)
I think it would address an important gap in PostgreSQL’s feature set.
> 2. How to compute the delta to be applied to materialized views
>
> Essentially, IVM is based on relational algebra. Theorically, changes on
> base
> tables are represented as deltas on this, like "R <- R + dR", and the
> delta on
> the materialized view is computed using base table deltas based on "change
> propagation equations". For implementation, we have to derive the
> equation from
> the view definition query (Query tree, or Plan tree?) and describe this as
> SQL
> query to compulte delta to be applied to the materialized view.
We had a similar discussion in this thread
https://www.postgresql.org/message-id/flat/FC784A9F-F599-4DCC-A45D-DBF6FA582D30%40QQdd.eu,
and I’m very much in agreement that the "change propagation equations”
approach can solve for a very substantial subset of common MV use cases.
> There could be several operations for view definition: selection,
> projection,
> join, aggregation, union, difference, intersection, etc. If we can
> prepare a
> module for each operation, it makes IVM extensable, so we can start a
> simple
> view definition, and then support more complex views.
Such a decomposition also allows ’stacking’, allowing complex MV definitions
to be attacked even with only a small handful of modules.
I did a bit of an experiment to see if "change propagation equations” could
be computed directly from the MV’s pg_node_tree representation in the
catalog in PlPgSQL. I found that pg_node_trees are not particularly friendly
to manipulation in PlPgSQL. Even with a more friendly-to-PlPgSQL
representation (I played with JSONB), then the next problem is making sense
of the structures, and unfortunately amongst the many plan/path/tree utility
functions in the code base, I figured only a very few could be sensibly
exposed to PlPgSQL. Ultimately, although I’m still attracted to the idea,
and I think it could be made to work, native code is the way to go at least
for now.
> 4. When to maintain materialized views
>
> [...]
>
> In the previous discussion[4], it is planned to start from "eager"
> approach. In our PoC
> implementaion, we used the other aproach, that is, using REFRESH command
> to perform IVM.
> I am not sure which is better as a start point, but I begin to think that
> the eager
> approach may be more simple since we don't have to maintain base table
> changes in other
> past transactions.
Certainly the eager approach allows progress to be made with less
infrastructure.
I am concerned that the eager approach only addresses a subset of the MV use
case space, though. For example, if we presume that an MV is present because
the underlying direct query would be non-performant, then we have to at
least question whether applying the delta-update would also be detrimental
to some use cases.
In the eager maintenance approache, we have to consider a race condition
where two
different transactions change base tables simultaneously as discussed in
[4].
I wonder if that nudges towards a logged approach. If the race is due to
fact of JOIN-worthy tuples been made visible after a COMMIT, but not before,
then does it not follow that the eager approach has to fire some kind of
reconciliation work at COMMIT time? That seems to imply a persistent queue
of some kind, since we can’t assume transactions to be so small to be able
to hold the queue in memory.
Hmm. I hadn’t really thought about that particular corner case. I guess a
‘catch' could be simply be to detect such a concurrent update and demote the
refresh approach by marking the MV stale awaiting a full refresh.
denty.
--
Sent from: http://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html
From | Date | Subject | |
---|---|---|---|
Next Message | David Steele | 2018-12-31 11:07:34 | Re: Add timeline to partial WAL segments |
Previous Message | Peter Eisentraut | 2018-12-31 09:50:20 | Re: Clean up some elog messages and comments for do_pg_stop_backup and do_pg_start_backup |