From: | Noah Misch <noah(at)leadboat(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Kevin Grittner <kgrittn(at)ymail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, "pgsql-hackers(at)postgreSQL(dot)org" <pgsql-hackers(at)postgresql(dot)org> |
Subject: | matview scannability rehash (was Re: Drastic performance loss in assert-enabled build in HEAD) |
Date: | 2013-04-04 21:52:45 |
Message-ID: | 20130404215245.GA26736@tornado.leadboat.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Subject updated to account for the wider topics now appearing.
On Wed, Apr 03, 2013 at 05:49:18PM -0400, Tom Lane wrote:
> What I'd actually rather see us spending time on right now is making
> some provision for incremental updates, which I will boldly propose
> could be supported by user-written triggers on the underlying tables
> if we only diked out the prohibitions against INSERT/UPDATE/DELETE on
> matviews, and allowed them to operate on a matview's contents just like
> it was a table. Now admittedly that would foreclose allowing matviews
> to be updatable in the updatable-view sense, but that's a feature I
> would readily give up if it meant users could build incremental update
> mechanisms this year and not two years down the road.
I can't see taking MVs in the direction of allowing arbitrary DML; that's what
tables are for. Users wishing to do that should keep using current methods:
CREATE VIEW mv_impl AS SELECT ...;
CREATE TABLE mv AS SELECT * FROM mv_impl;
-- REFRESH analogue: fancier approaches exist
BEGIN; TRUNCATE mv; INSERT INTO mv SELECT * FROM mv_impl; COMMIT;
If anything, I'd help these users by introducing mechanisms for obtaining a
TRUNCATE;INSERT with the bells and whistles of REFRESH MATERIALIZED VIEW.
Namely, bulk index rebuilds, skipping WAL, and frozen tuples.
> > ... Making sure that
> > the heap has at least one page if data has been generated seems
> > like a not-entirely-unreasonable way to do that, although there
> > remains at least one vacuum bug to fix if we keep it, in addition
> > to Tom's concerns.
>
> No. This is an absolute disaster. It's taking something we have always
> considered to be an irrelevant implementation detail and making it into
> user-visible DDL state, despite the fact that it doesn't begin to satisfy
> basic transactional behaviors. We *need* to get rid of that aspect of
> things. If you must have scannability state in version 0, okay, but
> it has to be a catalog property not this.
In large part, this ended up outside the catalogs due to key limitations of
the startup process. This isn't the first time we've arranged an unusual
dance for this reason, and it probably won't be the last. Sure, we could take
out unlogged MVs to evade the problem, but re-adding them will mean either
restoring relfilenode-based bookkeeping or attacking the startup process
limitation directly. There exist fundamental challenges to a direct attack,
like the potential inconsistency of system catalogs themselves. We could
teach pg_relation_is_scannable() that unlogged MVs are always non-scannable
during recovery, then somehow update system catalogs in all databases at the
end of recovery. Not a project for 9.3, and I wouldn't be surprised to see it
mushroom in complexity. The currently-committed approach is a good one given
the applicable constraints.
A slight variation on the committed approach would be to add a "_scannable"
relation fork. The fork would either be absent (not scannable if an MV) or
empty (possibly-scannable). Create it in CREATE MATERIALIZED VIEW sans WITH
NO DATA and in REFRESH MATERIALIZED VIEW. Remove it in TRUNCATE. When the
startup process reinitializes an unlogged relation, it would also remove any
_scannable fork. This has a few advantages over the current approach: VACUUM
won't need a special case, and pg_upgrade will be in a better position to blow
away all traces if we introduce a better approach. The disadvantage is an
extra inode per healthy MV. (Though it does not lead to a 9.3 solution, I'll
note that an always-present relation metapage would help here.)
Note that I said "possibly-scannable" -- the relfilenode-based indicator
(whether the committed approach or something else) doesn't need to remain the
only input to the question of scannability. If 9.5 introduces the concept of
age-based scannability expiration, the applicable timestamp could go in
pg_class, and pg_relation_is_scannable() could check both that and the
relfilenode-based indicator.
Concerning the original $SUBJECT, I would look at fixing the performance
problem by having pg_relation_is_scannable() use an algorithm like this:
1. Grab the pg_class entry from syscache. If it's not found, return NULL.
2. If it's not a matview, return false.
3. Lock the matview and try to open a relcache entry. Return NULL on failure.
4. Return the scannability as reported by the relcache.
This boils down to the CASE statement you noted upthread, except putting the
fast-exit logic in the function rather than in its caller(s).
nm
--
Noah Misch
EnterpriseDB http://www.enterprisedb.com
From | Date | Subject | |
---|---|---|---|
Next Message | Noah Misch | 2013-04-04 22:00:39 | Re: Drastic performance loss in assert-enabled build in HEAD |
Previous Message | Tom Lane | 2013-04-04 21:52:32 | Re: Clang compiler warning on 9.3 HEAD |