Re: support for MERGE

From: Amit Langote <amitlangote09(at)gmail(dot)com>
To: Álvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc: Daniel Westermann <dwe(at)dbi-services(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org, Simon Riggs <simon(dot)riggs(at)enterprisedb(dot)com>, Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
Subject: Re: support for MERGE
Date: 2021-11-16 02:38:03
Message-ID: CA+HiwqF0CEYoWCDj71smZ9WiaZkbk_coj9mokXb+_q1=A6M8Kw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Álvaro,

On Tue, Nov 16, 2021 at 6:01 AM Álvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> wrote:
> On 2021-Nov-14, Amit Langote wrote:
> > The only problem caused by the code block that follows the buggy if
> > statement unconditionally executing is wasted cycles. Fortunately,
> > there's no correctness issue, because rootRelInfo is the same as the
> > input result relation in the cases where the latter is not partitioned
> > and there'd be no map to convert the tuple, so the block is basically
> > a no-op. I was afraid about the case where the input relation is a
> > regular inheritance parent, though apparently we don't support MERGE
> > in that case to begin with.
>
> Well, I noticed that if we simply remove the ERROR that prevents that
> case, it works fine as far as I can tell.

Thanks for looking into that.

> Example (partly cribbed from
> the documentation):
>
> CREATE TABLE measurement (
> city_id int not null,
> logdate date not null,
> peaktemp int,
> unitsales int
> );
> CREATE TABLE measurement_y2006m02 (
> CHECK ( logdate >= DATE '2006-02-01' AND logdate < DATE '2006-03-01' )
> ) INHERITS (measurement);
> CREATE TABLE measurement_y2006m03 (
> CHECK ( logdate >= DATE '2006-03-01' AND logdate < DATE '2006-04-01' )
> ) INHERITS (measurement);
> CREATE TABLE measurement_y2007m01 (
> filler text,
> peaktemp int,
> logdate date not null,
> city_id int not null,
> unitsales int
> CHECK ( logdate >= DATE '2007-01-01' AND logdate < DATE '2007-02-01')
> );
> ALTER TABLE measurement_y2007m01 DROP COLUMN filler;
> ALTER TABLE measurement_y2007m01 INHERIT measurement;
>
> CREATE OR REPLACE FUNCTION measurement_insert_trigger()
> RETURNS TRIGGER AS $$
> BEGIN
> IF ( NEW.logdate >= DATE '2006-02-01' AND
> NEW.logdate < DATE '2006-03-01' ) THEN
> INSERT INTO measurement_y2006m02 VALUES (NEW.*);
> ELSIF ( NEW.logdate >= DATE '2006-03-01' AND
> NEW.logdate < DATE '2006-04-01' ) THEN
> INSERT INTO measurement_y2006m03 VALUES (NEW.*);
> ELSIF ( NEW.logdate >= DATE '2007-01-01' AND
> NEW.logdate < DATE '2007-02-01' ) THEN
> INSERT INTO measurement_y2007m01 (city_id, logdate, peaktemp, unitsales)
> VALUES (NEW.*);
> ELSE
> RAISE EXCEPTION 'Date out of range. Fix the measurement_insert_trigger() function!';
> END IF;
> RETURN NULL;
> END;
> $$ LANGUAGE plpgsql ;
> CREATE TRIGGER insert_measurement_trigger
> BEFORE INSERT ON measurement
> FOR EACH ROW EXECUTE PROCEDURE measurement_insert_trigger();
> INSERT INTO measurement VALUES (1, '2006-02-10', 35, 10);
> INSERT INTO measurement VALUES (1, '2006-02-16', 45, 20);
> INSERT INTO measurement VALUES (1, '2006-03-17', 25, 10);
> INSERT INTO measurement VALUES (1, '2006-03-27', 15, 40);
> INSERT INTO measurement VALUES (1, '2007-01-15', 10, 10);
> INSERT INTO measurement VALUES (1, '2007-01-17', 10, 10);
>
> alvherre=# select tableoid::regclass, * from measurement order by city_id, logdate;
> tableoid | city_id | logdate | peaktemp | unitsales
> ----------------------+---------+------------+----------+-----------
> measurement_y2006m02 | 1 | 2006-02-10 | 35 | 10
> measurement_y2006m02 | 1 | 2006-02-16 | 45 | 20
> measurement_y2006m03 | 1 | 2006-03-17 | 25 | 10
> measurement_y2006m03 | 1 | 2006-03-27 | 15 | 40
> measurement_y2007m01 | 1 | 2007-01-15 | 10 | 10
> measurement_y2007m01 | 1 | 2007-01-17 | 10 | 10
>
>
> CREATE TABLE new_measurement (LIKE measurement);
> INSERT INTO new_measurement VALUES (1, '2006-03-01', 20, 10);
> INSERT INTO new_measurement VALUES (1, '2006-02-16', 50, 10);
> INSERT INTO new_measurement VALUES (2, '2006-02-10', 20, 20);
> INSERT INTO new_measurement VALUES (1, '2006-03-27', NULL, NULL);
> INSERT INTO new_measurement VALUES (1, '2007-01-17', NULL, NULL);
> INSERT INTO new_measurement VALUES (1, '2007-01-15', 5, NULL);
> INSERT INTO new_measurement VALUES (1, '2007-01-16', 10, 10);
>
> MERGE into measurement m
> USING new_measurement nm ON
> (m.city_id = nm.city_id and m.logdate=nm.logdate)
> WHEN MATCHED AND nm.peaktemp IS NULL THEN DELETE
> WHEN MATCHED THEN UPDATE
> SET peaktemp = greatest(m.peaktemp, nm.peaktemp),
> unitsales = m.unitsales + coalesce(nm.unitsales, 0)
> WHEN NOT MATCHED THEN INSERT
> (city_id, logdate, peaktemp, unitsales)
> VALUES (city_id, logdate, peaktemp, unitsales);
>
> alvherre=# select tableoid::regclass, * from measurement order by city_id, logdate;
> tableoid | city_id | logdate | peaktemp | unitsales
> ----------------------+---------+------------+----------+-----------
> measurement_y2006m02 | 1 | 2006-02-10 | 35 | 10
> measurement_y2006m02 | 1 | 2006-02-16 | 50 | 30
> measurement_y2006m03 | 1 | 2006-03-01 | 20 | 10
> measurement_y2006m03 | 1 | 2006-03-17 | 25 | 10
> measurement_y2007m01 | 1 | 2007-01-15 | 10 | 10
> measurement_y2007m01 | 1 | 2007-01-16 | 10 | 10
> measurement_y2006m02 | 2 | 2006-02-10 | 20 | 20
>
>
> Even the DELETE case worked correctly (I mean, it deletes from the right
> partition).
>
> So I'm considering adding this case to the regression tests and remove
> the limitation. If anybody wants to try some more sophisticated
> examples, I'll welcome proposed additions to the regression tests.
>
> (Don't get me wrong -- I don't think this is terribly useful
> functionality. I just think that if the code is already prepared to
> handle it, we may as well let it.)
>
> One thing I didn't quite like is that the count of affected tuples seems
> off, but IIRC that's already the case for trigger-redirected tuples in
> legacy-style partitioning, so I'm not too worried about that.

AFAICS, MERGE operating on an inheritance parent that is not
partitioned should work mostly the same as the case where it is
partitioned (good thing that it works at all without needing any
special code!), though only the INSERT actions would have to be
handled appropriately by the user using triggers and such. And also I
guess any UPDATE actions that need to move rows between child tables
because that too involves tuple routing logic. As long as we're clear
on that in the documentation, I don't see why this case should not be
covered in the initial version.

I thought for a second about the cases where child tables have columns
not present in the root parent mentioned in the command, but I guess
that possibility doesn't present problems given that the command
wouldn't be able to mention such columns to begin with; it can only
refer to the root parent's column which must be present in all of the
affected child tables. In any case, I have a feeling that the planner
would catch any problematic cases if there're any while converting
MergeAction expressions into the individual child table layouts.

--
Amit Langote
EDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message houzj.fnst@fujitsu.com 2021-11-16 02:38:29 RE: parallel vacuum comments
Previous Message Michael Paquier 2021-11-16 02:34:40 Re: RecoveryInProgress() has critical side effects