Re: INSERT Trigger to check for existing records : Does this do what I hope it is doing?

From: Hagen Finley <hagen(at)datasundae(dot)com>
To: Adrian Klaver <adrian(dot)klaver(at)aklaver(dot)com>, pgsql-general(at)lists(dot)postgresql(dot)org
Subject: Re: INSERT Trigger to check for existing records : Does this do what I hope it is doing?
Date: 2020-11-25 15:41:07
Message-ID: d7c2cf5b-d100-01d8-7e2e-d57a144420df@datasundae.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Folks,

Just a quick question. *Using this FUNCTION:*

CREATE OR REPLACE FUNCTION same_test(did numeric)
RETURNS numeric AS $$
BEGIN
  IF $1 IN
      (SELECT dealid from hygiene_112)
  THEN
    UPDATE hygiene_119 SET paid = 'SAME';
  ELSE
      UPDATE hygiene_119 SET paid = 'NEW';
  END IF;
RETURN NULL;
END;
$$ LANGUAGE plpgsql;

*Does the following query input the the dealids that result from the
SELECT statement into the parameter of the sames_test() FUNCTION?*

Select dealid sametest(dealid) FROM hygiene_123;

I doubt it does (my query runs a /long time)/ :-). I know I can utilize
python to push SELECT results into a array and then run a 'FOR d in
dealids' LOOP to feed the FUNCTION parameter but I'd like to learn how
to do that with nested SQL statements or FUNCTIONS.

Thanks!

Hagen

On 11/22/20 4:28 PM, Hagen Finley wrote:
>
> Adrian and Michael,
>
> My current insert logic (which works) is in a psycopg2 python script
> which reads a spreadsheet row into an array, so for the moment I
> didn't want to add that integration to my struggle.
>
> cur = conn.cursor()
> \
> query = "INSERT INTO
> sfdc(theater,country,account,smotion,opname,cprod,opid,*ndealid,*qnum,*stage,revusd*,cdate,bdate,age,opown,opnum,sonum,fbdate,region,dqnum,pid,closed,won,onum,repdate)
>
> VALUES
> (%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s);"
>
> cur.executemany(query, frecords)
> conn.commit()
> conn.close()
>
>
> However, the following is something of a stepping stone towards the
> destination and, (dare I say it? ;-) it works:
>
> CREATE OR REPLACE FUNCTION same_test(did numeric)
> RETURNS numeric AS $$
> BEGIN
>   IF $1 IN
>       (SELECT ndealid from hygiene_112)
>   THEN
>     UPDATE hygiene_119 SET paid = 'SAME';
>   ELSE
>       UPDATE hygiene_119 SET paid = 'NEW';
>   END IF;
> RETURN NULL;
> END;
> $$ LANGUAGE plpgsql;
>
>
> I changed the dealid to something that doesn't exist (14593030) in the
> 11-2 table and the function updates the 11-9 table.status field to "NEW":
>
> sales=# UPDATE hygiene_112 SET ndealid = 14593030 WHERE ndealid =
> 14593039;
> UPDATE 1
>
> SELECT same_test(14593039);
>
> +----------+-----------------------------------+------------------------------------------------------+--------+----------+----------------------+------+
> |repdate   |ndealid |revusd    |stage                 |status  |
> +----------+-----------------------------------+------------------------------------------------------+--------+----------+----------------------+------+
> |2020-11-09|14593039|1015624.23|Propose - 60% *|NEW * |
> +----------+-----------------------------------+------------------------------------------------------+--------+----------+----------------------+------+
>
> When I changed it back I get the proper "SAME" update:
>
> sales=# UPDATE hygiene_112 SET ndealid = 14593039 WHERE ndealid =
> 14593030;
> UPDATE 1
>
> SELECT same_test(14593039);
>
> +----------+-----------------------------------+------------------------------------------------------+--------+----------+----------------------+------+
> |repdate   |ndealid |revusd    |stage                 |status  |
> +----------+-----------------------------------+------------------------------------------------------+--------+----------+----------------------+------+
> |2020-11-09|14593039|1015624.23|Propose - 60%         |*SAME * |
> +----------+-----------------------------------+------------------------------------------------------+--------+----------+----------------------+------+
>
>
> I'm generally amazed when ANYTHING works so this is good news (to me).
> It seems logical I could replace the UPDATE statement with an INSERT
> statement at this point.
>
> However, that only addresses one of the /data rules /on my checklist.
>
> I'll keep forging ahead here and see what additional progress I can
> attain. Very much appreciate your patient assistance here.
>
> Best,
>
> Hagen
>
>
> On 11/22/20 11:26 AM, Adrian Klaver wrote:
>> On 11/22/20 9:53 AM, Hagen Finley wrote:
>>> Hello Michael,
>>>
>>> Thanks so much for this advice. As I mentioned previously, I'm not
>>> very good at this yet, so forgive me if my response is obtuse. I
>>> really love databases but my sentiments may be unrequited.
>>
>> The overriding issue is lack of a plan. From your first post:
>>
>> "Here’s what I (REALLY) want:
>>
>>
>> Trigger looks at three fields prior to new insert: Deal ID (numeric),
>> revenue (numeric), stage(char)     Example: 19743576 22072.37 Commit
>> - 90%
>>
>>
>>  1. If the NEW dealid doesn't match any of the OLD dealids, insert the
>>     new row
>>  2. if the NEW dealid, revenue and stage fields ALL match the OLD
>>     dealid, revenue and stage, skip (don't insert the NEW row)
>>  3. If the NEW dealid matches an OLD dealid but either the NEW revenue
>>     OR the stage fields have changed (don't match OLD record) insert new
>>     row (I'll review both rows manually)
>> "
>>
>> And from later post:
>>
>> " I figure I could
>> use the chk field to delete the new inserts I didn't need."
>>
>>
>> From this I come up with the following:
>>
>> 1) Data rules
>>
>>   a) If dealid in new data does not exist in old data INSERT row.
>>   b) Id dealid is in both new and old data AND revenue OR stage don't
>>  match then INSERT and mark for review.
>>   c) If new dealid, revenue, stage match old dealid, revenue, stage
>> then do not INSERT.
>>
>> 2) Process the data. Choices
>>    a) Use trigger on table sfdc
>>    b) Use staging table to hold new data and then process into sfdc
>> table
>>
>> 3) Process the data. Same basic principle for both choices in 2)
>> Flowchart
>>    a) In new data search for dealid in table sfdc if it does not
>> exist add data to sfdc.
>>    b) If new data dealid does exist in sfdc
>>     1) If revenue or stage field differ mark for review
>>     2) If they do match skip further processing
>> 4) Thoughts about above.
>>    a) To me table sfdc should only hold vetted data that is known to
>> be unique per row.
>>    b) The data for review  1)b) 3)b) should end up in another review
>> table e.g. sfdc_review.
>>    c) Since from OP ' 80% of the records are
>> the same as the existing records from the week before.' it makes
>> sense to use the staging table 2)b) process rather then throwing away
>> a lot of INSERTs.
>>
>> If this makes sense then it comes down to decision in which choice in
>> 2) to use. At that point it is filling in the flowchart with the
>> exact steps to take.
>>
>>>
>>> In reality my table has lots of columns (~30) including a report
>>> date (repdate) and each week's pull has a new repdate ( in this case
>>> 2020-11-02 and 2020-11-09) which could function as a "created on"
>>> field.
>>>
>>> To clarify, I would create an unique index on all the columns in the
>>> old report records (2020-11-02)  or just the three I am comparing
>>> (dealid,stage and revenue)?
>>>
>>> In either case, so far in my efforts it looks like the create index
>>> fails because there are lots of rows with the same stage value, and
>>> a few with the same revenue value.
>>>
>>> Create UNIQUE INDEX idx_sfdc
>>> ON sfdc(ndealid, stage, revusd);
>>>
>>> ERROR: could not create unique index "idx_sfdc" DETAIL: Key
>>> (ndealid, stage, revusd)=(19743576, Commit - 90% , 22072.37) is
>>> duplicated. SQL state: 23505
>>>
>>> I probably could create an unique index on the dealid column as that
>>> should be unique. Would that be enough? It seems like that would
>>> insert ONLY the new records with a new (unique) dealid and that
>>> would definitely by an important step forward.
>>>
>>> I hesitate to admit I have no idea how I would code the "call insert
>>> on conflict (unique index) do nothing" syntax, but I would be
>>> excited to learn.
>>>
>>> Best,
>>>
>>> Hagen
>>>
>>>
>>> On 11/22/20 8:54 AM, Michael Lewis wrote:
>>>> If you can modify your insert statement, and live with an extra
>>>> column in the data, no trigger is needed as best I can figure.
>>>>
>>>> Create a unique index over the existing columns, add a "created_on"
>>>> field and call insert on conflict (unique index) do nothing.
>>>>
>>>> This should give the behavior you want.
>>
>>

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Ron 2020-11-25 15:54:47 Re: Performance hit if I create multiple DBs on same instance
Previous Message Laurenz Albe 2020-11-25 14:59:06 Re: Performance hit if I create multiple DBs on same instance