Re: Unexpected Seq Scan's when using MERGE WHEN NOT MATCHED BY SOURCE

From: David Mullineux <dmullx(at)gmail(dot)com>
To: Feike Steenbergen <feikesteenbergen(at)gmail(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Unexpected Seq Scan's when using MERGE WHEN NOT MATCHED BY SOURCE
Date: 2025-01-09 15:52:45
Message-ID: CAGsyd8UeMRtmgrQZ1Tg7gMztr=pvF4Dp8mp0kGHzc9ct04FQhw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Thu, 9 Jan 2025, 12:26 Feike Steenbergen, <feikesteenbergen(at)gmail(dot)com>
wrote:

> I'm trying to change a few applications to fully use this, as PostgreSQL
> 17 added this support.
>
> The application does something like this:
>
> - fetch information from a source system and store it in a temp table
> - run a MERGE with a table (target) in this database,
> updating, inserting and deleting in a single statement
> - the target table holds information for multiple systems
>
> The temp table (source) doesn't contain the identifier for the system, as
> we can inject that at runtime.
>
> This is the shape of the statement:
>
> MERGE INTO
> merge_target AS t
> USING
> merge_source AS s ON (t.time = s.time AND t.device_id = $1)
> WHEN MATCHED THEN
> UPDATE set
> value = s.value
> WHEN NOT MATCHED THEN
> INSERT (device_id, time, value) VALUES ($1, time, value)
> WHEN NOT MATCHED BY SOURCE
> AND t.device_id = $1
> THEN DELETE;
>
> If we run this however, there is always a Seq Scan against merge_target,
> whereas the filter of `AND t.device_id = $1` uses a (Bitmap) Index scan
> in other types of queries.
>
> Previously we would - in a CTE - do a delete *and then* the merge.
>
> Which performed much better as the DELETE would not do a Seq Scan.
>
> Attached a full reproducing test case. Some numbers:
>
> Previously, DELETE in CTE, then merge:
> Planning Time: 6.700 ms
> Execution Time: 7.473 ms
>
> Using the MERGE WHEN MATCHED ON SOURCE THEN DELETE:
> Planning Time: 0.994 ms
> Execution Time: 65.695 ms
>
> My questions are:
>
> - is the Seq Scan expected by others in this mailing list as well?
> - Is it required to do the Seq Scan?
> - is this something that could be optimized?
>
> Kind regards,
>
> Feike Steenbergen
>

An excellent post. I wish all posters provided nice contained example cases
like this one.

I note ,in the documentation, that a Warning box got added which says
this...

If both WHEN NOT MATCHED BY SOURCE and WHEN NOT MATCHED [BY TARGET] clauses
are specified, the MERGE command will perform a FULL join between
data_source and the target table. For this to work, at least one
join_condition subexpression must use an operator that can support a hash
join, or all of the subexpressions must use operators that can support a
merge join.

This could be a hint as to the reason maybe ? The NOT MATCHED BY SOURCE is
new feature to 17. I'm looking forward to others replies here.

>

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Tom Lane 2025-01-09 19:39:53 Re: Unexpected Seq Scan's when using MERGE WHEN NOT MATCHED BY SOURCE
Previous Message Feike Steenbergen 2025-01-09 12:39:13 Re: Unexpected Seq Scan's when using MERGE WHEN NOT MATCHED BY SOURCE