Re: Check for duplicates before inserting new rows

From: Chris Sterritt <chris(dot)sterritt(at)yobota(dot)xyz>
To: pgsql-general(at)lists(dot)postgresql(dot)org
Subject: Re: Check for duplicates before inserting new rows
Date: 2020-09-04 15:51:04
Message-ID: eaeafafb-b01c-947e-07ab-7d374f1ac19c@yobota.xyz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general


On 04/09/2020 15:46, Rich Shepard wrote:
> On Fri, 4 Sep 2020, Olivier Gautherot wrote:
>
>> First of all, what version of PostgreSQL are you using?
>
> Olivier,
>
> 12.2.
>
>> One way would be to add a UNIQUE constraint and perform for each row
>> of the
>> source table a INSERT ... ON CONFLICT DO NOTHING (see
>> https://www.postgresql.org/docs/current/sql-insert.html#SQL-ON-CONFLICT)
>>
>> If it is a 2-way merge, I would encapsulate the process in a function
>> (which will create a transaction to protect your process) and add a
>> column
>> to trace the rows that have been merged. For this purpose, you can
>> use the
>> ON CONFLICT DO UPDATE ...
>
> Lots for me to ponder.
>
> The new data has attributes stored in two database tables: location and
> measurements. I'll need to think about how both could be handled in a
> single
> function. For example, the new data contains a row with a location not
> already in the existing location table. That location needs to be
> added to
> the location table and its associated measurement attributes then
> added to
> the existing measurements table.
>
> Thanks for the suggestion,
>
> Rich
>
>

Assuming some simple table structures (I've not included PK or FK
definitions for simplicity):

create table station (station_id integer,station_data text);

create table measurement (measurement_id bigserial,station_id integer,ameasurement text);

-- Add some test stations
insert into station(station_id,station_data)
values (1,'station1'),
(2,'station2'),
(3,'station3') ;

*Query to identify new measurements:
select station_id,ameasurement from (VALUES (1,'1meas1'), -- this represents your new test data set
(1,'1meas2'),
(2,'2meas1'),
(3,'3meas1'))as m(station_id,ameasurement)
except select station_id,ameasurement from measurement;

The query above will give you a list of the new values which are not currently stored in table measurement.
Run it and we get all 4 rows returned.

Add a measurement row:
insert into measurement(station_id,ameasurement)
values (2,'2meas1');

Now if you repeat the check for new measurements with the same query as at *, you only get 3 rows.

Cheers, Chris Sterritt

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Chris Sterritt 2020-09-04 16:13:34 Re: Check for duplicates before inserting new rows
Previous Message Sang Gyu Kim 2020-09-04 15:31:40 Can I get some advice regarding logical replication server?