Re: Efficiently advancing a sequence without risking it going backwards.

From: Paul McGarry <paul(at)paulmcgarry(dot)com>
To: Jeremy Schneider <schneider(at)ardentperf(dot)com>
Cc: Christopher Browne <cbbrowne(at)gmail(dot)com>, PostgreSQL Mailing Lists <pgsql-general(at)postgresql(dot)org>
Subject: Re: Efficiently advancing a sequence without risking it going backwards.
Date: 2020-07-16 07:07:22
Message-ID: CAPrE0SZ+TLisSA9w6ztvKarWw0LcOrcBriF7HOBBvhTEHbgGBw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Fri, 10 Jul 2020 at 10:27, Jeremy Schneider <schneider(at)ardentperf(dot)com>
wrote:

>
> OP asked for a way to call setval() with a guarantee the sequence will
> never go backwards IIUC. His code can check that the new value he wants to
> set is higher than the current value, but there’s a race condition where a
> second connection could quickly advance the sequence between the check and
> the setval() call and then cause duplicates from the next call which is bad.
>
> The ideal solution is a setval_forward_only() or setval_no_duplicates()
> function that does it atomically or something. If it were possible to
> “lock” the entire sequence to prevent any other sessions from using it at
> all, that would work too. Not locking a value, locking the whole thing.
> Very bad hack solution is renaming the sequence then renaming it back as a
> blunt form of locking... and to be clear I don’t think is a good idea I
> just was saying that technically it might work. :)
>
> -Jeremy
>

Yes, that first paragraph is a good summary. A "setval_forward_only()" is
the sort of thing I was after.

Maybe something analogous to:
UPDATE the_seq SET last_value = number WHERE last_value < number;
with some sort of global (but short) lock as required.

Relating to some of the other replies there isn't a "requirement" (from an
application perspective) that the sequences always generate ids in
ascending order or that they don't skip numbers etc. To the application
they are just ids, as long as they are unique that is enough. However it is
an application that is used by people, so there is some external value in
having the ids going up in a way that roughly correlates to time as people
tend to expect numbers to do that sort of thing.

For a bit more background, we have our own application and homegrown
loosely coupled multi-primary DB cluster and replication system.
Each backend DB in the cluster has its own node id (0-9) and when our app
asks for a sequence value it calls a custom function which gets a normal
sequence value suffixed with the DB node ID.

So if there were two backend dbs (1 and 2) and both backend dbs had a
sequence with last_value of 1234 then our application would get a
"sequence" value of 12351 or 12352 depending on which db backend served the
request.
The resulting ids are unique across our cluster, but certainly not gapless
nor issued in strict ascending order which is fine from an application
perspective.

But as mentioned, from a human perspective there is some value in keeping
the ids issued by the cluster roughly in time order, so we have a secondary
process which liaises with all the backend nodes and pulls forwards any
sequences that fall behind other nodes. So if DB 1 happened to serve 1000
requests using the sequence while DB2 served none, the process pulls the
sequence in DB2 forward until it catches up, currently by calling nextval
in a loop.

Which all works fine. However sometimes (eg taking a node offline for
maintenance or upgrade) a sequence might get quite a long way out, and
calling nextval() 100k times seems a rather inefficient way to catch up
(but it is better to be inefficient than risk going backwards and causing a
duplicate id).

We have been using essentially this system for our cluster since Postgres 7
days, periodically we have touched base with Postgres replication
advancements (which have come a long way) but haven't yet found a
compelling reason to switch from what is working.

Paul

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Jehan-Guillaume de Rorthais 2020-07-16 11:03:17 Re: Clustering solution ?
Previous Message David Rowley 2020-07-15 22:29:59 Re: Same query taking less time in low configuration machine