Re: Drastic performance loss in assert-enabled build in HEAD

From: Kevin Grittner <kgrittn(at)ymail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, "pgsql-hackers(at)postgreSQL(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Drastic performance loss in assert-enabled build in HEAD
Date: 2013-04-04 18:01:26
Message-ID: 1365098486.46185.YahooMailNeo@web162906.mail.bf1.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Early versions of the matview patch had a relisvalid flag in
pg_class to determine whether the relation was scannable.  The name
was chosen based on a similarity to the purpose of indisvalid,
although the proliferation of new bools for related issues left me
wondering if a "char" would be a better idea.  Based on on-list
reviews I removed that in favor of basing the state on a
zero-length heap file, in an attempt to work better with unlogged
matviews.  I can easily look back through my development branch to
find the commits which made this change and revert them if this
approach is preferred.

I realize this would need to be Real Soon Now, but before reverting
to the earlier code I want to allow a *little* more time for
opinions.

Responses to particular points below.

Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> Granting that throwing an error is actually of some use to some people,
> I would not think that people would want to turn it on via a command
> that throws away the existing view contents altogether, nor turn it off
> with a full-throated REFRESH.

There is no doubt that in future versions we will want to be able
to disallow scans based on other criteria than just whether the
data is valid as of *some* point in time.  I can't imagine a
circumstance under which we would want to allow scans if it
doesn't.  So, at least for this release and as a default for future
releases, I think it makes sense that a matview last CREATEd or
REFRESHed WITH NO DATA should not be scannable.  Additional knobs
for the users to control this can and should be added in future
releases.

> There are going to need to be ways to incrementally update
> matviews, and ways to disable/enable access that are not tied to
> a complete rebuild, not to mention being based on user-determined
> rather than hard-wired criteria for what's too stale.

Absolutely.  So far as I know, nobody has ever suggested or
expected otherwise.  This paper provides a useful summary of
techniques for incremental updates, with references to more
detailed papers:

http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.40.2254&rep=rep1&type=pdf

I expect to be working on implementing the most obvious special
cases and a "catch-all" general solution for 9.4.  Efficient
incremental update seems to depend on the ability to have at least
one new "hidden" system column, so once we get to a good point for
discussing 9.4 issues, I'll be on about that.

> If you feel that scannability disable is an absolute must for version 0,
> let's invent a matview reloption or some such to implement it and let
> users turn it on and off as they wish.  That seems a lot more likely
> to still be useful two years from now.

What about the idea of a relisvalid bool or some "char" column in
pg_class?

> And if you're absolutely convinced that unlogged matviews mustn't
> work as I suggest, we can lose those from 9.3, too.

I was loath to do that, but as Nicolas points out, they really
aren't that interesting without incremental update.  Perhaps it is
better to drop those until next release and have a more
sophisticated way to deal with invalidation of those -- or as you
suggested, just punt them to be empty.  (I would hate to do that,
but it might be the lesser of evils.)

> What I'd actually rather see us spending time on right now is making
> some provision for incremental updates, which I will boldly propose
> could be supported by user-written triggers on the underlying tables
> if we only diked out the prohibitions against INSERT/UPDATE/DELETE on
> matviews, and allowed them to operate on a matview's contents just like
> it was a table.  Now admittedly that would foreclose allowing matviews
> to be updatable in the updatable-view sense, but that's a feature I
> would readily give up if it meant users could build incremental update
> mechanisms this year and not two years down the road.

I think that should be the last resort, after we have a more
automated declarative way of maintaining the majority of cases.
Since I don't see too much there where using that technique with
matviews would give you more than you could do right now with
tables and triggers, hand-coding the incremental maintenance is
very low on my list of priorities.  In any event, I don't want to
rush into any such thing this close to release; that seems to me to
be clearly 9.4 material.

>> Individual judges have a "dashboard" showing such things as the
>> number of court cases which have gone beyond certain thresholds
>> without action

> If you need 100% accuracy, which is what this scenario appears to be
> demanding, matviews are not for you (yet).  The existing implementation
> certainly is nowhere near satisfying this scenario.

No, we're talking about timelines measured in weeks or months.  A
nightly update is acceptable, although there are occasional gripes
by a judge that they would like to see the results of cleaning up
such cases sooner than the next morning.  An hour latency would
probably make them happy.  If async incremental update could give a
new view of things in five or ten minutes they would be overjoyed.

>> ... Making sure that the heap has at least one page if data has
>> been generated seems like a not-entirely-unreasonable way to do
>> that, although there remains at least one vacuum bug to fix if
>> we keep it, in addition to Tom's concerns.
>
> No.  This is an absolute disaster.  It's taking something we have always
> considered to be an irrelevant implementation detail and making it into
> user-visible DDL state, despite the fact that it doesn't begin to satisfy
> basic transactional behaviors.  We *need* to get rid of that aspect of
> things.

OK, I'm convinced.

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Vibhor Kumar 2013-04-04 18:03:18 Re: pg_dump selectively ignores extension configuration tables
Previous Message Stephen Frost 2013-04-04 17:54:24 Re: Hash Join cost estimates