Re: Drastic performance loss in assert-enabled build in HEAD

From: Kevin Grittner <kgrittn(at)ymail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: "pgsql-hackers(at)postgreSQL(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Drastic performance loss in assert-enabled build in HEAD
Date: 2013-04-03 20:45:02
Message-ID: 1365021902.27661.YahooMailNeo@web162902.mail.bf1.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

>>> In fact, I'm going to go further and say that I do not like the
>>> entire concept of scannability, either as to design or
>>> implementation, and I think we should just plain rip it out.
>
>> This has been my feeling from the beginning, so I'm happy to
>> support this position.  I think the current version - where
>> scan-ability is tracked in just one way - is an improvement over
>> the previous version of the patch - where it was tracked in two
>> different ways with a confusing shuffle of information from one
>> place to the other. But my favorite number of places to track it
>> would be zero.
>
> To be clear, I think we'll end up tracking some notion of
> scannability eventually.  I just don't think the current notion
> is sufficiently baked to want to promise to be upward-compatible
> with it in future.

To be honest, I don't think I've personally seen a single use case
for matviews where they could be used if you couldn't count on an
error if attempting to use them without the contents reflecting a
materialization of the associated query at *some* point in time.
There probably are such, but I think removing this entirely takes
the percentage of use cases covered by the implementation in this
release down from 10% to 2%.

Consider where the Wisconsin Courts use "home-grown" materialized
views currently:

(1) On the public web site for circuit court data, visibility is
based on supreme court rules and the advice of a committee
consisting of judges, representatives of the press, defense
attorneys, prosecuting attorneys, etc.  There are cases in the
database which, for one reason or another, should not show up on
the public web site.  On a weekly basis, where monitoring shows the
lowest usage, the table of cases which are "too old" to be shown
according to the rules thus determined is regenerated.  If there
was the possibility that a dump and load could fail to fill this,
and the queries would run without error, they could not move from
ad hoc matview techniques to the new feature without excessive
risk.

(2) Individual judges have a "dashboard" showing such things as the
number of court cases which have gone beyond certain thresholds
without action.  They can "drill down" to detail so that cases
which have "slipped through the cracks" can be scheduled for some
appropriate action.  "Justice delayed..." and all of that.  It
would be much better to get an error which would result in
"information currently unavailable" than to give the impression
that there are no such cases.

Since the main point of this patch is to get the basis for a more
complete matview feature into place while still supporting *some*
use cases, and a high priority needs to be place on not painting
ourselves into a corner, I agree we should rip this out if we think
it does so.  I have spent some time looking at what we will want to
add in future releases, and a more sophisticated concept of what is
"fresh" enough to allow use of the materialized data is certainly
on the list, and falling back to running the underlying query like
a "normal" view would be a reasonable option to be able to choose
in some cases; but I think that will *always* need to start with
information about whether data *has* been generated, and an empty
set has to be considered valid data if it has been.  If we come up
with a way to track that which isn't too fragile, I'm confident
that it will remain useful as the feature evolves. Making sure that
the heap has at least one page if data has been generated seems
like a not-entirely-unreasonable way to do that, although there
remains at least one vacuum bug to fix if we keep it, in addition
to Tom's concerns.  It has the advantage of playing nicely with
unlogged tables.  If this is not going to be what we use long term,
do we have a clue what is?

Personally, I think it would be sad to reduce the use cases for
which matviews in 9.3 can be used to those which can tolerate an
error-free reference to a matview for which data has not been
generated, but I'll rip out support for that distinction if that is
the consensus.

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2013-04-03 21:17:39 Re: c language functions
Previous Message Jeff Janes 2013-04-03 19:50:57 Re: commit dfda6ebaec67 versus wal_keep_segments