From: | Kevin Grittner <kgrittn(at)ymail(dot)com> |
---|---|
To: | Andres Freund <andres(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Kevin Grittner <kgrittn(at)postgresql(dot)org>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: [COMMITTERS] pgsql: Add support for REFRESH MATERIALIZED VIEW CONCURRENTLY. |
Date: | 2013-07-23 16:27:34 |
Message-ID: | 1374596854.54580.YahooMailNeo@web162904.mail.bf1.yahoo.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-committers pgsql-hackers |
Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
> Hm. There seems to be more things that need some more improvement
> from a quick look.
>
> First, I have my doubts of the general modus operandy of
> CONCURRENTLY here. I think in many, many cases it would be faster
> to simply swap in the newly built heap + indexes in concurrently.
> I think I have seen that mentioned in the review while mostly
> skipping the discussion, but still. That would be easy enough as
> of Robert's mvcc patch.
Yeah, in many cases you would not want to choose to specify this
technique. The point was to have a technique which could run
without blocking while a heavy load of queries (including perhaps a
very long-running query) was executing. If subsequent events
provide an alternative way to do that, it's worth looking at it;
but my hope was to get this out of the way to allow time to develop
incremental maintenance of matviews for the next CF. If we spend
too much time reworking this to micro-optimize it for new patches,
incremental maintenance might not happen for 9.4.
The bigger point is perhaps syntax. If MVCC catalogs are really at
a point where we can substitute a new heap and new indexes for a
matview without taking an AccessExclusiveLock, then we should just
make the current code do that. I think there is still a place for
a differential update for large matviews which only need small
changes on a REFRESH, but perhaps the right term for that is not
CONCURRENTLY.
> * Shouldn't the GetUserIdAndSecContext(), NewGUCNestLevel() dance
> be done a good bit earlier? We're executing queries before
> that.
This can't be in effect while we are creating temp tables. If
we're going to support a differential REFRESH technique, it must be
done more "surgically" than to wrap the whole REFRESH statement
execution.
> * The loop over indexes in refresh_by_match_merge should
> index_open(ExclusiveLock) the indexes initially instead of
> searching the syscache manually. They are opened inside the
> loop in many cases anyway. There probably aren't any hazards
> currently, but I am not even sure about that. The index_open()
> in the loop at the very least processes the invalidation
> messages other backends send...
Fair enough. That seems like a simple change.
> I'd even suggest using BuildIndexInfo() or such on the indexes,
> then you could use ->ii_Expressions et al instead of peeking
> into indkeys by hand.
Given that the function is in a source file described as containing
"code to create and destroy POSTGRES index relations" and the
comments for that function say that it "stores the information
about the index that's needed by FormIndexDatum, which is used for
both index_build() and later insertion of individual index tuples,"
and we're not going to create or destroy an index or call
FormIndexDatum or insert individual index tuples from this code, it
would seem to be a significant expansion of the charter of that
function. What benefits do you see to using that level?
> * All not explicitly looked up operators should be qualified
> using OPERATOR(). It's easy to define your own = operator for
> tid and set the search_path to public,pg_catalog. Now, the
> whole thing is restricted to talbe owners afaics, making this
> decidedly less dicey, but still. But if anyobdy actually uses a
> search_path like the above you can easily hurt them.
> * Same seems to hold true for the y = x and y.* IS DISTINCT FROM
> x.* operations.
> * (y.*) = (x.*) also needs to do proper operator lookups.
I wasn't aware that people could override the equality operators
for tid and RECORD, so that seemed like unnecessary overhead. I
can pull the operators from the btree AM if people can do that.
> * OpenMatViewIncrementalMaintenance() et al seem to be
> underdocumented.
If the adjacent comment for
MatViewIncrementalMaintenanceIsEnabled() left you at all uncertain
about what OpenMatViewIncrementalMaintenance() and
CloseMatViewIncrementalMaintenance() are for, I can add comments.
At the time I wrote it, it seemed to me to be redundant to have
such comments, and would cause people to waste more time looking at
them then would be saved by anything they could add to what the
function names and one-line function bodies conveyed; but if there
was doubt in your mind about when they should be called, I'll add
something.
Would it help to move the static functions underneath the
(documented) public function and its comment?
> * why is it even allowed that matview_maintenance_depth is > 1?
> Unless I miss something there doesn't seem to be support for a
> recursive refresh whatever that would mean?
I was trying to create the function in a way that would be useful
for incremental maintenance, too. Those could conceivably recurse.
Also, this way bugs in usage are more easily detected.
> * INSERT and DELETE should also alias the table names, to make
> sure there cannot be issues with the nested queries.
I don't see the hazard -- could you elaborate?
> * I'd strongly suggest more descriptive table aliases than x, y,
> d. Those make the statements rather hard to parse.
I guess. Those are intended to be internal, but I guess there's no
reason not to be more verbose in the aliases.
> * SPI_exec() is deprecated in favor of SPI_execute()
According to what? The documentation of SPI_exec doesn't say
anything about it being deprecated; it does say:
| SPI_exec is the same as SPI_execute, with the latter's read_only
| parameter always taken as false.
I didn't see any particular benefit to using the more verbose form,
but it's certainly easy enough to change if people don't like the
shorter form.
> * ISTM deferred uniqueness checks and similar things should be
> enforced to run ub refresh_by_match_merge()
"ub"? That looks like a typo, but I'm not sure what you mean.
Could you show a case where there is a hazard?
There was also a mention of a temp table left around. I had
intended to specify ON COMMIT DROP for the internally used temp
tables, but apparently lost track of that.
I'll wait for the thread to settle down to offer a patch (or set of
patches).
--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
From | Date | Subject | |
---|---|---|---|
Next Message | Alvaro Herrera | 2013-07-23 18:08:19 | pgsql: Tweak FOR UPDATE/SHARE error message wording (again) |
Previous Message | Robert Haas | 2013-07-23 15:13:03 | pgsql: Use InvalidSnapshot, now SnapshotNow, as the default snapshot. |
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2013-07-23 16:28:34 | Re: getting rid of SnapshotNow |
Previous Message | David E. Wheeler | 2013-07-23 16:24:07 | Re: Comma Comma Comma 8601 |