RE: Implementing Incremental View Maintenance

From: "tsunakawa(dot)takay(at)fujitsu(dot)com" <tsunakawa(dot)takay(at)fujitsu(dot)com>
To: 'Tatsuo Ishii' <ishii(at)sraoss(dot)co(dot)jp>
Cc: "nagata(at)sraoss(dot)co(dot)jp" <nagata(at)sraoss(dot)co(dot)jp>, "michael(at)paquier(dot)xyz" <michael(at)paquier(dot)xyz>, "amitlangote09(at)gmail(dot)com" <amitlangote09(at)gmail(dot)com>, "hoshiai(at)sraoss(dot)co(dot)jp" <hoshiai(at)sraoss(dot)co(dot)jp>, "alvherre(at)2ndquadrant(dot)com" <alvherre(at)2ndquadrant(dot)com>, "thomas(dot)munro(at)gmail(dot)com" <thomas(dot)munro(at)gmail(dot)com>, "kgrittn(at)gmail(dot)com" <kgrittn(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: RE: Implementing Incremental View Maintenance
Date: 2019-12-25 05:27:26
Message-ID: OSAPR01MB50731E416230EFEB22A58F2AFE280@OSAPR01MB5073.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

From: Tatsuo Ishii <ishii(at)sraoss(dot)co(dot)jp>
> AFAIK benefit of ON STATEMENT is the transaction can see the result of
> update to the base tables. With ON COMMIT, the transaction does not
> see the result until the transaction commits.
>
> Well, I can see use cases of IVM in both DWH and OLTP.
>
> For example, a user create a DWH-like data using materialized
> view. After the initial data is loaded, the data is seldom updated.
> However one day a user wants to change just one row to see how it
> affects to the whole DWH data. IVM will help here because it could be
> done in shorter time than loading whole data.

> I heard that REFRESH ON STATEMENT of Oracle has been added after ON
> COMMIT materialized view. So I suspect Oracle realizes that there are
> needs/use case for ON STATEMENT, but I am not sure.

Yes, it was added relatively recently in Oracle Database 12.2. As the following introduction to new features shows, the benefits are described as twofold:
1) The transaction can see the refreshed view result without committing.
2) The materialized view log is not needed.

I guess from these that the ON STATEMENT refresh mode can be useful when the user wants to experiment with some changes to see how data change could affect the analytics result, without persisting the change. I think that type of experiment is done in completely or almost static data marts where the user is allowed to modify the data freely. The ON STATEMENT refresh mode wouldn't be for the DWH that requires high-performance, regular and/or continuous data loading and maintenance based on a rigorous discipline. But I'm still not sure if this is a real-world use case...

https://docs.oracle.com/en/database/oracle/oracle-database/19/dwhsg/release-changes.html#GUID-2A2D6E3B-A3FD-47A8-82A3-1EF95AEF5993
--------------------------------------------------
ON STATEMENT refresh mode for materialized views
The ON STATEMENT refresh mode refreshes materialized views every time a DML operation is performed on any base table, without the need to commit the transaction. This mode does not require you to maintain materialized view logs on the base tables.
--------------------------------------------------

> Another use case is a ticket selling system. The system shows how many
> tickets remain in a real time manner. For this purpose it needs to
> count the number of tickets already sold from a log table. By using
> IVM, it could be accomplished in simple and effective way.

Wouldn't the app just have a table like ticket(id, name, quantity), decrement the quantity when the ticket is sold, and read the current quantity to know the remaining tickets? If many consumers try to buy tickets for a popular event, the materialized view refresh would limit the concurrency.

> Here are some use cases suitable for IVM I can think of:
>
> - Users are creating home made triggers to get data from tables. Since
> IVM could eliminates some of those triggers, we could expect less
> maintenance cost and bugs accidentally brought in when the triggers
> were created.
>
> - Any use case in which the cost of refreshing whole result table
> (materialized view) is so expensive that it justifies the cost of
> updating of base tables. See the example of use cases above.

I think we need to find a typical example of this. That should be useful to write the manual article, because it's better to caution users that the IMV is a good fit for this case and not for that case. Using real-world table names in the syntax example will also be good.

> > * Do you think the benefit of ON STATEMENT (i.e. do not have to use
> materialized view log) outweighs the drawback of ON STATEMENT (i.g. DML
> overhead)?
>
> Outweights to what?

"outweigh" means "exceed." I meant that I'm wondering if and why users prefer ON STATEMENT's benefit despite of its additional overhead on update statements.

Bottom line: The use of triggers makes me hesitate, because I saw someone's (probably Fujii san) article that INSERTs into inheritance-and-trigger-based partitioned tables were 10 times slower than the declaration-based partitioned tables. I think I will try to find a good use case.

Regards
Takayuki Tsunakawa

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2019-12-25 05:31:01 Re: unsupportable composite type partition keys
Previous Message Alexander Korotkov 2019-12-25 05:25:38 Re: Avoid full GIN index scan when possible