From: | armand pirvu <armand(dot)pirvu(at)gmail(dot)com> |
---|---|
To: | Peter Geoghegan <pg(at)bowt(dot)ie> |
Cc: | "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org> |
Subject: | Re: insert on conflict |
Date: | 2017-06-27 20:43:35 |
Message-ID: | D3E9A970-B985-4DAA-BE36-83E55B66FAD5@gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
> On Jun 27, 2017, at 3:30 PM, Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
>
> On Tue, Jun 27, 2017 at 12:40 PM, armand pirvu <armand(dot)pirvu(at)gmail(dot)com> wrote:
>> so how is it working in fact ? Isn't it working like looping in the
>> IVEE.dim_company and for each company_id if the record does have a
>> correspondent in csischema.dim_company then update csischema.dim_company set
>> company_name = EXCLUDED.company_name where company_id=... ? If so isn't it
>> supposed to use the PK for each company_id ? Or is it more like building a
>> whole list from IVEE.dim_company and treat like a join ? Just trying to
>> understand
>
> The processing here has to happen a tuple at a time. That's because
> the index structure itself is the only authoritative source of truth
> about whether or not there is a possible conflict. An MVCC snapshot
> isn't good enough, because it sees a consistent view of things, not
> the true physical reality of what exists or does not exist in the
> index.
>
> What you end up with here is a weird nested loop join, you might say.
> The implementation couldn't do it any other way (this could never
> behave more like a merge join), because we must eagerly check for
> conflicts right as we insert (our insert would be registered by
> *other* inserters/upserters as a conflict). If the implementation did
> ever do it that way, it would break the important UPSERT guarantees
> around concurrency.
>
> MERGE does this in other systems, which is okay for those other
> systems because MERGE makes no special promises about concurrency
> (e.g., you can get a unique violation in the joined-on column with
> MERGE). But, MERGE would be faster for bulk loading, which is what
> MERGE is good for.
>
> --
> Peter Geoghegan
Hi Peter
So for example if IVEE.dim_company has 10k rows and csischema.dim_company has 40 rows, what will happen for each row in IVEE.dim_company we check csischema.dim_company and if the check tells row is in it switches to update and this would mean yes I scan IVEE.dim_company, however should an update be needed in csischema.dim_company it will use the csischema.dim_company PK since we pass one value gotten from IVEE.dim_company
The question I guess is what happens IF I IVEE.dim_company accounts for far more than 5% of csischema.dim_company ? Will that translate into a scan on csischema.dim_company ?
What I am looking at now looks like a potential racing contention which so I am wondering if there are better ways to do it
Thanks
Armand
From | Date | Subject | |
---|---|---|---|
Next Message | David G. Johnston | 2017-06-27 21:26:04 | Re: postgres: dbname dbuser 9.9.9.9[2222] PARSE waiting |
Previous Message | Peter Geoghegan | 2017-06-27 20:30:16 | Re: insert on conflict |