Re: pg_advisory_lock problem

From: Rémi Cura <remi(dot)cura(at)gmail(dot)com>
To: Kevin Grittner <kgrittn(at)ymail(dot)com>
Cc: Merlin Moncure <mmoncure(at)gmail(dot)com>, John R Pierce <pierce(at)hogranch(dot)com>, PostgreSQL General <pgsql-general(at)postgresql(dot)org>
Subject: Re: pg_advisory_lock problem
Date: 2014-08-12 10:07:53
Message-ID: CAJvUf_sUGA9FsMgDs3eDtTsgq2TefOVOtL41Wr5TmpsizKcS2w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

2014-08-11 22:48 GMT+02:00 Kevin Grittner <kgrittn(at)ymail(dot)com>:

> Rémi Cura <remi(dot)cura(at)gmail(dot)com> wrote:
>
> > as you (both?) suggested it works using advisory lock used at the
> > beginning and end of the transaction. This way there is no upsert
> > at all if the element is locked? (I used general advisory lockbut
> > in the same way as transactionnal lock)
>
> This is too vague to comment on.
>
> > The issue in this case is simple : I have to use about 100k
> > advisory locks, which is a big memory requirement for my hardware
> > :-(
>
> ... and that doesn't seem to make any sense. Either you are not
> understanding advisory locks or you are doing something very, very
> unusual.
>
> > Merlin I'm afraid I don't understand what is "vanilla" LOCK
> > TABLE.
>
> See the LOCK TABLE command.
>
> http://www.postgresql.org/docs/current/interactive/sql-lock.html
>
> http://www.postgresql.org/docs/current/interactive/explicit-locking.html
>
> > I can't really use a lock table because each query upsert
> > sequentially into 3 tables, doing lots of computing between.
>
> Now *that* I understand. :-) It's not an unusual requirement,
> but can be a challenge when using snapshot isolation (where writes
> don't block reads and reads don't block anything). There are two
> main approaches -- introduce blocking to serialize some of the
> operations, or use the SERIALIZABLE transaction isolation level to
> ensure that the behavior of all concurrent transactions is
> consistent with the behavior you would see if they were run one at
> a time. The latter approach doesn't introduce any new blocking,
> but it can cause transactions to get an ERROR with a SQLSTATE of
> 40001 at just about any point, so you need to be prepared to
> recognize that and retry those transactions from the beginning (not
> just the last statement of the transaction), ignoring any data read
> during the failed attempt.
>
> You may want to read the entire chapter on concurrency control:
>
> http://www.postgresql.org/docs/current/interactive/mvcc.html
>
> If you are considering using SERIALIZABLE transactions, you should
> probably review the examples in the Wiki, to get an idea of how it
> behaves in various cases:
>
> http://wiki.postgresql.org/wiki/SSI
>
> > I use parallel query to compute faster (load dividing). I guess
> > it would be very slow with about 8 parallel queries with locks.
>
> Well, if you introduce blocking you reduce your parallelism, but if
> you use serializable transactions and there are actually a lot of
> conflicts you can see poor performance because of the errors
> rolling back transactions and the need to retry them from the
> start. The techniques used to implement serializable transactions
> in PostgreSQL are basically a refinement of the Optimistic
> Concurrency Control (OCC) techniques, but generally with far fewer
> retries needed -- the point being that it optimistically assumes
> that there will not be a conflict so that concurrency is better,
> but has to cancel things if that optimism proves to be unfounded.
>
> To make related to changes to multiple tables and maintain coherent
> data, you probably will need to do one or the other.
>
> --
> Kevin Grittner
> EDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>

Hey, thanks for your detailed answer.

The particularity here is that I use a big query with CTE instead of a more
procedural way.
I do sophisticated geometric computing using postGIS. I guess it is a hack
of both postgres and postgis.

I explain better the pg_advisory locks uses I have tried.

*First classic use of pg_advisory, not working :*
CTE_1 (find what rows will be upserted in table_1)
CTE_2 (find what rows will be upserted in table_2)
CTE_3 (find what rows will be upserted in table_3)
CTE_4 (compute the result to be upserted into table_1)
CTE_5 (upsert into table_1 using custom upsert_function)
CTE_6 (compute the result to be upserted into table_2)
CTE_7 (upsert into table_2 using custom upsert_function)
CTE_8 (compute the result to be upserted into table_2)
CTE_9 (upsert into table_2 using custom upsert_function)
CTE_10 (end of query)
each of the upserting function is plpgsql and do something like
pg_advisory_lock(table_number, id of row to be upserted)
with updating AS (update table)
insert into table if not updated
pg_advisory_unlock(table_number,id of row to be upserted)

According to what the list said, it doesn't work because of visibility
issues : the locking work, so we know each processes will upsert the same
thing sequentially. However it will fail because each process has no
visibility on the insert done by the others. So when the second process
upsert the same thing, it will try to insert , and we get 2 inserts for the
same row

*Second non classic use of pg_adivsory, working but too taxing on resources*
CTE_1 (find what rows will be upserted in table_1*,
pg_try_advisory_lock(1,id1)* )
CTE_2 (find what rows will be upserted in table_2,
*pg_try_advisory_lock(2,id2)* )
CTE_3 (find what rows will be upserted in table_3,*
pg_try_advisory_lock(3,id3)* )
CTE_4 (compute the result to be upserted into table_1)
CTE_5 (upsert into table_1 using custom upsert_function,* only if
pg_try_advisory_lock was true* )
CTE_6 (compute the result to be upserted into table_2)
CTE_7 (upsert into table_2 using custom upsert_function,* only if
pg_try_advisory_lock was true*)
CTE_8 (compute the result to be upserted into table_2)
CTE_9 (upsert into table_2 using custom upsert_function, *only if
pg_try_advisory_lock was true*)
*CTE_10 (release locks for (1,id1) , (2,id2) , (3,id3)) )*
CTE_10 (end of query)

This works because we don't even try to upsert a row that is already being
taken care by another process. So there is no visibility issue;
The problem is with ressources, I have to use 100k locks for several
minutes. It grezatly increase memory use.

The difference between the 2 is essentially : in first case we lock each
individual upsert statement. In second case we lock at the query level.

On the bright side, I don't need to wait for the insert in table_1 to
compute table_2, same for table_2 and table_3.
However I can't control when the CTE upserting into each table is executed
(as far as I know, there is no guarantee of execution order when using
multiple cte).

Now about the lock of tables: I guess it wouldn't work for the same reasons
at the individual upsert statement.
If I use it at the whole query level, other parallel process will be
waiting for the whole query to finish before being allowed to go on?
Moreover, whatever LOCK I use will be within the same big transaction, so
the only solution may be :

lock table_1,table_2,table_3;
execute big querry;
unlock table_1, table_2, table_3;

I didn't find any "vanilla" lock. Is "vanilla" some kind of postgres slang?

Thanks for the links to the doc. I didn't know about SERIALIZABLE
transaction, nor about mvcc .
I guess it would be equivalent to put an exeception block in the upsert
function on duplicated primary key?
I don't see how to use it at querry level. The whole query being in one
transaction, I would have to retry it because a few percent rows are
duplicates !

Thanks all for your help,
Cheers,
Rémi-C

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Merlin Moncure 2014-08-12 12:45:37 Re: pg_advisory_lock problem
Previous Message Russell Keane 2014-08-12 09:59:16 Re: Upgrading 9.0 to 9.3 - permission denied to pg_upgrade_utility.log