Re: [COMMITTERS] pgsql: Add support for REFRESH MATERIALIZED VIEW CONCURRENTLY.

From: Kevin Grittner <kgrittn(at)ymail(dot)com>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Noah Misch <noah(at)leadboat(dot)com>
Subject: Re: [COMMITTERS] pgsql: Add support for REFRESH MATERIALIZED VIEW CONCURRENTLY.
Date: 2013-08-05 15:37:57
Message-ID: 1375717077.37431.YahooMailNeo@web162901.mail.bf1.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-committers pgsql-hackers

Some of the issues raised by Andres and Noah have been addressed.
These all seemed simple and non-controversial, so I've just applied
the suggested fixes.

Some issues remain, such as how best to create the temp table used
for the "diff" data, and the related simplification of the security
context swapping that might become possible with a change there.
Review of that area has raised a couple other questions that I'm
looking into.  These all probably amount to enough that I will add
the patch(es) to address them to the next CF.

Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
> On 2013-07-23 09:27:34 -0700, Kevin Grittner wrote:
>> Andres Freund <andres(at)2ndquadrant(dot)com> wrote:

>>> The loop over indexes in refresh_by_match_merge should
>>> index_open(ExclusiveLock) the indexes initially instead of
>>> searching the syscache manually. They are opened inside the
>>> loop in many cases anyway. There probably aren't any hazards
>>> currently, but I am not even sure about that. The index_open()
>>> in the loop at the very least processes the invalidation
>>> messages other backends send...

Fixed.

>>>   I'd even suggest using BuildIndexInfo() or such on the indexes,
>>>   then you could use ->ii_Expressions et al instead of peeking
>>>   into indkeys by hand.
>>
>> Given that the function is in a source file described as containing
>> "code to create and destroy POSTGRES index relations" and the
>> comments for that function say that it "stores the information
>> about the index that's needed by FormIndexDatum, which is used for
>> both index_build() and later insertion of individual index tuples,"
>> and we're not going to create or destroy an index or call
>> FormIndexDatum or insert individual index tuples from this code, it
>> would seem to be a significant expansion of the charter of that
>> function.  What benefits do you see to using that level?
>
> I'd prevent you from needing to peek into indkeys. Note that it's
> already used in various places.

I looked at where it was and wasn't used, and continue to be
skeptical.  For example, both techniques are used in tablecmds.c;
the technique you suggest is used where an index is being created,
dropped, or index tuples are being manipulated, while the
Form_pg_index structure is being used when the definition of the
index is being examined without directly manipulating it.  Compare
what is being done in my code with the existing code for
ATExecDropNotNull(), for example.

>>> [while comparison of indexed columns used OPERATOR() correctly,
>>> comparison of tid and rowvar values did not]

>> I wasn't aware that people could override the equality operators
>> for tid and RECORD

>> [example proving the possibility]

>> I think for the cases where you're comparing tids it's fine to just
>> to hardcode the operator to OPERATOR(pg_catalog.=).

Done.

>>> * I'd strongly suggest more descriptive table aliases than x, y,
>>>   d. Those make the statements rather hard to parse.
>>
>> I guess.  Those are intended to be internal, but I guess there's no
>> reason not to be more verbose in the aliases.
>
> Well, for one, other people will read that code every now and then. I am
> not 100% convinced that all the statements are correct, and it's more
> effort to do so right now. Also, those statements will show up in error
> messages.

Done.

> What I am thinking of is that you'll get a successfull REFRESH
> CONCURRENTLY but it will later error out at COMMIT time because there
> were constraint violations. Afaik we don't have any such behaviour for
> existing DDL and I don't like introducing it.

REFRESH MATERIALIZED VIEW CONCURRENTLY is definitely not DDL.  It
is DML, and behavior should be consistent with that.  (Note that
the definition of the matview remains exactly the same after the
statement executes as it was before; only the data is modified.)
Without the CONCURRENTLY clause it's in the same sort of gray area
as TRUNCATE TABLE, where it is essentially DML, but the
implementation details are similar to that of DDL, so it may
sometimes be hard to avoid DDL-like behaviors, even though it would
be best to do so.  We have no such problem here.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-committers by date

  From Date Subject
Next Message Andres Freund 2013-08-05 16:14:39 Re: [COMMITTERS] pgsql: Add support for REFRESH MATERIALIZED VIEW CONCURRENTLY.
Previous Message Kevin Grittner 2013-08-05 14:59:32 pgsql: Various cleanups for REFRESH MATERIALIZED VIEW CONCURRENTLY.

Browse pgsql-hackers by date

  From Date Subject
Next Message Merlin Moncure 2013-08-05 15:49:08 StrategyGetBuffer optimization, take 2
Previous Message Andres Freund 2013-08-05 15:22:28 Re: getting rid of SnapshotNow