Re: insert on conflict

From: armand pirvu <armand(dot)pirvu(at)gmail(dot)com>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Re: insert on conflict
Date: 2017-06-27 20:43:35
Message-ID: D3E9A970-B985-4DAA-BE36-83E55B66FAD5@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

> On Jun 27, 2017, at 3:30 PM, Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
>
> On Tue, Jun 27, 2017 at 12:40 PM, armand pirvu <armand(dot)pirvu(at)gmail(dot)com> wrote:
>> so how is it working in fact ? Isn't it working like looping in the
>> IVEE.dim_company and for each company_id if the record does have a
>> correspondent in csischema.dim_company then update csischema.dim_company set
>> company_name = EXCLUDED.company_name where company_id=... ? If so isn't it
>> supposed to use the PK for each company_id ? Or is it more like building a
>> whole list from IVEE.dim_company and treat like a join ? Just trying to
>> understand
>
> The processing here has to happen a tuple at a time. That's because
> the index structure itself is the only authoritative source of truth
> about whether or not there is a possible conflict. An MVCC snapshot
> isn't good enough, because it sees a consistent view of things, not
> the true physical reality of what exists or does not exist in the
> index.
>
> What you end up with here is a weird nested loop join, you might say.
> The implementation couldn't do it any other way (this could never
> behave more like a merge join), because we must eagerly check for
> conflicts right as we insert (our insert would be registered by
> *other* inserters/upserters as a conflict). If the implementation did
> ever do it that way, it would break the important UPSERT guarantees
> around concurrency.
>
> MERGE does this in other systems, which is okay for those other
> systems because MERGE makes no special promises about concurrency
> (e.g., you can get a unique violation in the joined-on column with
> MERGE). But, MERGE would be faster for bulk loading, which is what
> MERGE is good for.
>
> --
> Peter Geoghegan

Hi Peter

So for example if IVEE.dim_company has 10k rows and csischema.dim_company has 40 rows, what will happen for each row in IVEE.dim_company we check csischema.dim_company and if the check tells row is in it switches to update and this would mean yes I scan IVEE.dim_company, however should an update be needed in csischema.dim_company it will use the csischema.dim_company PK since we pass one value gotten from IVEE.dim_company
The question I guess is what happens IF I IVEE.dim_company accounts for far more than 5% of csischema.dim_company ? Will that translate into a scan on csischema.dim_company ?

What I am looking at now looks like a potential racing contention which so I am wondering if there are better ways to do it

Thanks
Armand

In response to

Browse pgsql-general by date

  From Date Subject
Next Message David G. Johnston 2017-06-27 21:26:04 Re: postgres: dbname dbuser 9.9.9.9[2222] PARSE waiting
Previous Message Peter Geoghegan 2017-06-27 20:30:16 Re: insert on conflict