Re: Understanding updates and deadlocks more clearly

From: Ron <ronljohnsonjr(at)gmail(dot)com>
To: pgsql-admin(at)lists(dot)postgresql(dot)org
Subject: Re: Understanding updates and deadlocks more clearly
Date: 2022-09-03 00:28:17
Message-ID: d96be449-285d-b4af-283e-3618a0d62e57@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

On 9/2/22 16:15, Wells Oliver wrote:
> Hey folks: I want to clarify a few things about locks on updates:
>
> 1- can two processes update the same table if they are updating different
> rows?

You can test this yourself by trying it,

>
> 2- in the event of two processes trying to update the same row, does this
> always result in a deadlock?

If txn 1 starts, updates TABLE_A then TABLE_B and commits, while "at the
same time" txn 2 (say it's the same code but a different end user processing
different data) does the same thing to the same or similar rows, you *might*
have a *lock conflict* (depending on RDBMS, transaction type, type of
indices, length of transaction, how well it's written, etc).

OTOH, you might get a *deadlock* if If txn 1 starts, updates TABLE_A then
TABLE_B and commits, while "at the same time" txn 2 updates TABLE_*B* then
TABLE_A.

In that case, txn 1 locks a chunk of TABLE_A, while txn 2 locks a chunk of
TABLE_*B*, and then txn 1 tries to lock TABLE_*B* but can't, because txn 2
has already locked it.  "Simultaneously", txn 2 tries to lock TABLE_*A* but
can't, because txn 1 has already locked it.

That's a simplistic form of deadlock, which for 25 years competent DBMSs
have been able to avoid.  (Even so, you can sometimes force the issue by
using SERIALIZABLE transaction mode.)

>
> 3- is there a way to create a system whereby potentially concurrent
> processes can update the same row in a given table, can the server be made
> to queue those updates, etc?

This is what DBMSs have been doing for *50+ years*.  Of course a single CPU
system can by definition only do one thing at a time; while multi-CPU
systems can execute multiple CPU op-codes at once, at some point you'll get
to a serialized bottleneck.  For example, the disk controller can only write
*one "thing"* at at time to a disk block.

Thus, at some deep stage, everything is serialized.  Having said that, there
are many layers between your application sending SQL code to Postgresql, and
there hints that you can provide the RDBMS, and also there are hints that
the RDBMS can figure out itself.

PostgreSQL uses MVCC (multi-version concurrency control, which keeps
multiple versions of a record on disk; each transaction can only see one
version, and at some point some of the record versions are marked "dead",
and they just sit there wasting space and IO (since disks read blocks, not
Postgresql records).  This is why VACUUM is so important in UPDATE/DELETE
heavy tables.

Transaction length also plays into concurrency (or lack thereof).  A txn
that starts, does some quick DB changes (having done as much work as
possible beforehand) and then commits allows for faster concurrency than one
that starts, does some work, does a bunch of unrelated tasks, inefficiently
modifies the database, and then commits. *Or worse*, starts a transaction,
does some work, and then *waits to commit* until the user (who just went to
lunch!!! presses a key.

--
Angular momentum makes the world go 'round.

In response to

Browse pgsql-admin by date

  From Date Subject
Next Message liam saffioti 2022-09-05 11:54:56 PostgreSQL Out of Memory
Previous Message David G. Johnston 2022-09-02 21:55:57 Re: Understanding updates and deadlocks more clearly