From: | Andres Freund <andres(at)2ndquadrant(dot)com> |
---|---|
To: | Kevin Grittner <kgrittn(at)ymail(dot)com> |
Cc: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kevin Grittner <kgrittn(at)postgresql(dot)org>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: [COMMITTERS] pgsql: Add support for REFRESH MATERIALIZED VIEW CONCURRENTLY. |
Date: | 2013-07-31 01:39:14 |
Message-ID: | 20130731013914.GD19053@alap2.anarazel.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-committers pgsql-hackers |
Hi Kevin,
On 2013-07-23 09:27:34 -0700, Kevin Grittner wrote:
> Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
>
> > Hm. There seems to be more things that need some more improvement
> > from a quick look.
> >
> > First, I have my doubts of the general modus operandy of
> > CONCURRENTLY here. I think in many, many cases it would be faster
> > to simply swap in the newly built heap + indexes in concurrently.
> > I think I have seen that mentioned in the review while mostly
> > skipping the discussion, but still. That would be easy enough as
> > of Robert's mvcc patch.
>
> Yeah, in many cases you would not want to choose to specify this
> technique. The point was to have a technique which could run
> without blocking while a heavy load of queries (including perhaps a
> very long-running query) was executing. If subsequent events
> provide an alternative way to do that, it's worth looking at it;
> but my hope was to get this out of the way to allow time to develop
> incremental maintenance of matviews for the next CF.
I thought there was a pretty obvious way to do this concurrently. But
after mulling over it for a few days I either miss a connection I made
previously or, more likely, I have missed some fundamental difficulties
previously.
I'd would be fairly easy if indexes were attached to a relations
relfilenode, not a relations oid... But changing that certainly doesn't
fall in the "easy enough" category anymore.
> The bigger point is perhaps syntax. If MVCC catalogs are really at
> a point where we can substitute a new heap and new indexes for a
> matview without taking an AccessExclusiveLock, then we should just
> make the current code do that. I think there is still a place for
> a differential update for large matviews which only need small
> changes on a REFRESH, but perhaps the right term for that is not
> CONCURRENTLY.
Hm. Yes. Especially as any CONCURRENTLY implementation building and
swapping in a new heap that I can think of requires to be run outside a
transaction (like CIC et al).
> > * Shouldn't the GetUserIdAndSecContext(), NewGUCNestLevel() dance
> > be done a good bit earlier? We're executing queries before
> > that.
>
> This can't be in effect while we are creating temp tables. If
> we're going to support a differential REFRESH technique, it must be
> done more "surgically" than to wrap the whole REFRESH statement
> execution.
Sure, it cannot be active when creating the temp table. But it's
currently inactive while you SPI_execute queries. That can't be right.
> > I'd even suggest using BuildIndexInfo() or such on the indexes,
> > then you could use ->ii_Expressions et al instead of peeking
> > into indkeys by hand.
>
> Given that the function is in a source file described as containing
> "code to create and destroy POSTGRES index relations" and the
> comments for that function say that it "stores the information
> about the index that's needed by FormIndexDatum, which is used for
> both index_build() and later insertion of individual index tuples,"
> and we're not going to create or destroy an index or call
> FormIndexDatum or insert individual index tuples from this code, it
> would seem to be a significant expansion of the charter of that
> function. What benefits do you see to using that level?
I'd prevent you from needing to peek into indkeys. Note that it's
already used in various places.
> > * All not explicitly looked up operators should be qualified
> > using OPERATOR(). It's easy to define your own = operator for
> > tid and set the search_path to public,pg_catalog. Now, the
> > whole thing is restricted to talbe owners afaics, making this
> > decidedly less dicey, but still. But if anyobdy actually uses a
> > search_path like the above you can easily hurt them.
> > * Same seems to hold true for the y = x and y.* IS DISTINCT FROM
> > x.* operations.
> > * (y.*) = (x.*) also needs to do proper operator lookups.
>
> I wasn't aware that people could override the equality operators
> for tid and RECORD, so that seemed like unnecessary overhead. I
> can pull the operators from the btree AM if people can do that.
Sure, You don't need any special privs for it:
CREATE OPERATOR public.= (
LEFTARG = tid,
RIGHTARG = tid,
PROCEDURE = tideq);
That's obviously not doing anything nefarious, since it calls the
builtin function, but you get the idea.
I think for the cases where you're comparing tids it's fine to just to
hardcode the operator to OPERATOR(pg_catalog.=).
> > * OpenMatViewIncrementalMaintenance() et al seem to be
> > underdocumented.
>
> If the adjacent comment for
> MatViewIncrementalMaintenanceIsEnabled() left you at all uncertain
> about what OpenMatViewIncrementalMaintenance() and
> CloseMatViewIncrementalMaintenance() are for, I can add comments.
> At the time I wrote it, it seemed to me to be redundant to have
> such comments, and would cause people to waste more time looking at
> them then would be saved by anything they could add to what the
> function names and one-line function bodies conveyed; but if there
> was doubt in your mind about when they should be called, I'll add
> something.
I am not asking for comments that basically paraphrase the functionname,
I don't like those. More for something like:
"Normally DML statements aren't allowed on materialized views, but since
REFRESH CONCURRENTLY internally is implemented by running a bunch of
queries executing DML (c.f. refresh_by_match_merge()) we need to
temporarily allow it while refreshing. Callsites preventing DML and
similar on materialized views thus need to check whether we're currently
refreshing."
> Would it help to move the static functions underneath the
> (documented) public function and its comment?
Yes.
> > * why is it even allowed that matview_maintenance_depth is > 1?
> > Unless I miss something there doesn't seem to be support for a
> > recursive refresh whatever that would mean?
>
> I was trying to create the function in a way that would be useful
> for incremental maintenance, too. Those could conceivably recurse.
> Also, this way bugs in usage are more easily detected.
Also, that belongs in a comment.
> > * INSERT and DELETE should also alias the table names, to make
> > sure there cannot be issues with the nested queries.
>
> I don't see the hazard -- could you elaborate?
I don't think there is an actual hazard *right now*. But at least for
the DELETE case it took me a second or two to make sure there's no case
a matview called 'd' cannot cause problems.
> > * I'd strongly suggest more descriptive table aliases than x, y,
> > d. Those make the statements rather hard to parse.
>
> I guess. Those are intended to be internal, but I guess there's no
> reason not to be more verbose in the aliases.
Well, for one, other people will read that code every now and then. I am
not 100% convinced that all the statements are correct, and it's more
effort to do so right now. Also, those statements will show up in error
messages.
I first wanted to amend the above to say that they will also show up in
pg_stat_activity, but it doesn't look like it atm, you're never setting
anything. Which is consistent with other commands, even if it strikes me
as suboptimal.
> > * SPI_exec() is deprecated in favor of SPI_execute()
> According to what? The documentation of SPI_exec doesn't say
> anything about it being deprecated; it does say:
I only remembered the following comment from spi.c:
/* Obsolete version of SPI_execute */
I guess, if the docs don't show that...
> > * ISTM deferred uniqueness checks and similar things should be
> > enforced to run ub refresh_by_match_merge()
>
> "ub"? That looks like a typo, but I'm not sure what you mean.
> Could you show a case where there is a hazard?
I think I wanted to type "in" but moved one row to far to the
left... What I am thinking of is that you'll get a successfull REFRESH
CONCURRENTLY but it will later error out at COMMIT time because there
were constraint violations. Afaik we don't have any such behaviour for
existing DDL and I don't like introducing it.
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
From | Date | Subject | |
---|---|---|---|
Next Message | Fujii Masao | 2013-07-31 13:37:53 | pgsql: Fix inaccurate description of tablespace. |
Previous Message | Noah Misch | 2013-07-31 00:03:56 | pgsql: Restore REINDEX constraint validation. |
From | Date | Subject | |
---|---|---|---|
Next Message | Andres Freund | 2013-07-31 02:14:34 | Re: Computer VARSIZE_ANY(PTR) during debugging |
Previous Message | Amit Langote | 2013-07-31 00:57:50 | Re: Computer VARSIZE_ANY(PTR) during debugging |