Re: INSERT Trigger to check for existing records

From: Hagen Finley <hagen(at)datasundae(dot)com>
To: Adrian Klaver <adrian(dot)klaver(at)aklaver(dot)com>, pgsql-general(at)lists(dot)postgresql(dot)org
Subject: Re: INSERT Trigger to check for existing records
Date: 2020-11-22 23:28:04
Message-ID: 204a1ad6-fd56-fa19-2211-cde267810b92@datasundae.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Adrian and Michael,

My current insert logic (which works) is in a psycopg2 python script
which reads a spreadsheet row into an array, so for the moment I didn't
want to add that integration to my struggle.

cur = conn.cursor()
\
query = "INSERT INTO
sfdc(theater,country,account,smotion,opname,cprod,opid,*ndealid,*qnum,*stage,revusd*,cdate,bdate,age,opown,opnum,sonum,fbdate,region,dqnum,pid,closed,won,onum,repdate)

VALUES
(%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s);"

cur.executemany(query, frecords)
conn.commit()
conn.close()

However, the following is something of a stepping stone towards the
destination and, (dare I say it? ;-) it works:

CREATE OR REPLACE FUNCTION same_test(did numeric)
RETURNS numeric AS $$
BEGIN
  IF $1 IN
      (SELECT ndealid from hygiene_112)
  THEN
    UPDATE hygiene_119 SET paid = 'SAME';
  ELSE
      UPDATE hygiene_119 SET paid = 'NEW';
  END IF;
RETURN NULL;
END;
$$ LANGUAGE plpgsql;

I changed the dealid to something that doesn't exist (14593030) in the
11-2 table and the function updates the 11-9 table.status field to "NEW":

sales=# UPDATE hygiene_112 SET ndealid = 14593030 WHERE ndealid = 14593039;
UPDATE 1

SELECT same_test(14593039);

+----------+-----------------------------------+------------------------------------------------------+--------+----------+----------------------+------+
|repdate   |ndealid |revusd    |stage                 |status  |
+----------+-----------------------------------+------------------------------------------------------+--------+----------+----------------------+------+
|2020-11-09|14593039|1015624.23|Propose - 60% *|NEW * |
+----------+-----------------------------------+------------------------------------------------------+--------+----------+----------------------+------+

When I changed it back I get the proper "SAME" update:

sales=# UPDATE hygiene_112 SET ndealid = 14593039 WHERE ndealid = 14593030;
UPDATE 1

SELECT same_test(14593039);

+----------+-----------------------------------+------------------------------------------------------+--------+----------+----------------------+------+
|repdate   |ndealid |revusd    |stage                 |status  |
+----------+-----------------------------------+------------------------------------------------------+--------+----------+----------------------+------+
|2020-11-09|14593039|1015624.23|Propose - 60%         |*SAME * |
+----------+-----------------------------------+------------------------------------------------------+--------+----------+----------------------+------+

I'm generally amazed when ANYTHING works so this is good news (to me).
It seems logical I could replace the UPDATE statement with an INSERT
statement at this point.

However, that only addresses one of the /data rules /on my checklist.

I'll keep forging ahead here and see what additional progress I can
attain. Very much appreciate your patient assistance here.

Best,

Hagen

On 11/22/20 11:26 AM, Adrian Klaver wrote:
> On 11/22/20 9:53 AM, Hagen Finley wrote:
>> Hello Michael,
>>
>> Thanks so much for this advice. As I mentioned previously, I'm not
>> very good at this yet, so forgive me if my response is obtuse. I
>> really love databases but my sentiments may be unrequited.
>
> The overriding issue is lack of a plan. From your first post:
>
> "Here’s what I (REALLY) want:
>
>
> Trigger looks at three fields prior to new insert: Deal ID (numeric),
> revenue (numeric), stage(char)     Example: 19743576 22072.37 Commit
> - 90%
>
>
>  1. If the NEW dealid doesn't match any of the OLD dealids, insert the
>     new row
>  2. if the NEW dealid, revenue and stage fields ALL match the OLD
>     dealid, revenue and stage, skip (don't insert the NEW row)
>  3. If the NEW dealid matches an OLD dealid but either the NEW revenue
>     OR the stage fields have changed (don't match OLD record) insert new
>     row (I'll review both rows manually)
> "
>
> And from later post:
>
> " I figure I could
> use the chk field to delete the new inserts I didn't need."
>
>
> From this I come up with the following:
>
> 1) Data rules
>
>   a) If dealid in new data does not exist in old data INSERT row.
>   b) Id dealid is in both new and old data AND revenue OR stage don't
>  match then INSERT and mark for review.
>   c) If new dealid, revenue, stage match old dealid, revenue, stage
> then do not INSERT.
>
> 2) Process the data. Choices
>    a) Use trigger on table sfdc
>    b) Use staging table to hold new data and then process into sfdc table
>
> 3) Process the data. Same basic principle for both choices in 2)
> Flowchart
>    a) In new data search for dealid in table sfdc if it does not exist
> add data to sfdc.
>    b) If new data dealid does exist in sfdc
>     1) If revenue or stage field differ mark for review
>     2) If they do match skip further processing
> 4) Thoughts about above.
>    a) To me table sfdc should only hold vetted data that is known to
> be unique per row.
>    b) The data for review  1)b) 3)b) should end up in another review
> table e.g. sfdc_review.
>    c) Since from OP ' 80% of the records are
> the same as the existing records from the week before.' it makes sense
> to use the staging table 2)b) process rather then throwing away a lot
> of INSERTs.
>
> If this makes sense then it comes down to decision in which choice in
> 2) to use. At that point it is filling in the flowchart with the exact
> steps to take.
>
>>
>> In reality my table has lots of columns (~30) including a report date
>> (repdate) and each week's pull has a new repdate ( in this case
>> 2020-11-02 and 2020-11-09) which could function as a "created on" field.
>>
>> To clarify, I would create an unique index on all the columns in the
>> old report records (2020-11-02)  or just the three I am comparing
>> (dealid,stage and revenue)?
>>
>> In either case, so far in my efforts it looks like the create index
>> fails because there are lots of rows with the same stage value, and a
>> few with the same revenue value.
>>
>> Create UNIQUE INDEX idx_sfdc
>> ON sfdc(ndealid, stage, revusd);
>>
>> ERROR: could not create unique index "idx_sfdc" DETAIL: Key (ndealid,
>> stage, revusd)=(19743576, Commit - 90% , 22072.37) is duplicated. SQL
>> state: 23505
>>
>> I probably could create an unique index on the dealid column as that
>> should be unique. Would that be enough? It seems like that would
>> insert ONLY the new records with a new (unique) dealid and that would
>> definitely by an important step forward.
>>
>> I hesitate to admit I have no idea how I would code the "call insert
>> on conflict (unique index) do nothing" syntax, but I would be excited
>> to learn.
>>
>> Best,
>>
>> Hagen
>>
>>
>> On 11/22/20 8:54 AM, Michael Lewis wrote:
>>> If you can modify your insert statement, and live with an extra
>>> column in the data, no trigger is needed as best I can figure.
>>>
>>> Create a unique index over the existing columns, add a "created_on"
>>> field and call insert on conflict (unique index) do nothing.
>>>
>>> This should give the behavior you want.
>
>

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Muthukumar.GK 2020-11-23 04:53:29 Multiple result set not displayed in PgAdmin4
Previous Message hagen 2020-11-22 19:11:05 RE: INSERT Trigger to check for existing records