Re: INSERT ... ON CONFLICT DO UPDATE

From: "Daniel Verite" <daniel(at)manitou-mail(dot)org>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: INSERT ... ON CONFLICT DO UPDATE
Date: 2015-07-19 18:08:45
Message-ID: 928925e4-d55b-4144-b994-5004e4b5a0e3@mm
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Rafal Pietrak wrote:

> CREATE TABLE vouchers (a_voucher bigint PRIMARY KEY default
> (random()*1000000000)::bigint, issued date default now(), .....);

Generators of truly unique pseudo-random values provide a
better ground for this. Consider for example:

https://wiki.postgresql.org/wiki/Pseudo_encrypt

You may replace the round function with your own secret function,
so you'll get the required randomness, secrecy and uniqueness.
No need to deal with collisions on insertion as there are none.

> 2. with current (as of 9.5) implementation I think I can always "ON
> CONFLICT DO NOTHING", and retry the INSERT from application level.

Yes, but retrying is now easy, let's not underappreciate that.

As a test, with 9.5alpha1, I create a table with 100k unique
random numbers:

CREATE TABLE vouchers(id int primary key);

Then try to populate it immediately with 100k rows:

INSERT INTO vouchers
select (random()*1000000000)::int from
generate_series(1,100000)
ON CONFLICT DO NOTHING;

psql result: INSERT 0 99995

Note how 5 values conflicted right from the beginning,
even though we're claiming only 10^5 out of the 10^9 output
range (or 0.01%).
The probability of at least one collision is pretty high,
see the "birthday paradox" for the theory on that.

Anyway the collisions got eliminated without any effort
from me and that's quite useful already.

Now trying to insert 10k rows at a time:

INSERT INTO vouchers
SELECT (random()*1000000000)::int
FROM generate_series(1,10000)
ON CONFLICT DO NOTHING
RETURNING id;

when run repeatedly, it tends to return between 9995 and 10000 values.
If we want exactly N rows and we get back N-epsilon, then we need to
re-ask for epsilon rows, but this will converge fast to completion.
(that is, until you have enough values that the birthday paradox effect
really kicks in).

My point is that we can now achieve that without any exception handling
or transaction retry, and no plpgsql function to create, so it's really
a significant improvement in ease of use. And presumably in
performance too.

Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @ManitouMail

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Daniel Verite 2015-07-19 19:03:07 Re: INSERT ... ON CONFLICT DO UPDATE
Previous Message Amitabh Kant 2015-07-19 17:35:07 Upgrade postgres cluster on FreeBSD using pg_upgrade