Row locking during UPDATE

From: "David F(dot) Skoll" <dfs(at)roaringpenguin(dot)com>
To: pgsql-admin(at)postgresql(dot)org
Subject: Row locking during UPDATE
Date: 2003-09-04 12:21:17
Message-ID: Pine.LNX.4.55.0309040812310.1962@shishi.roaringpenguin.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

Hi,

I have a weird problem and want to know if I understand what's happening.
I have a table like this:

create table statistics (
date DATE DEFAULT current_date,
key TEXT,
value INTEGER DEFAULT 0,
UNIQUE(date, key)
);

and I have a bunch of clients running. Often, a client does
something like this:

UPDATE statistics SET value = value + 1
WHERE key = 'KEY' AND date = current_date;

What appears to be happening is this:

1) Sometimes, a whole bunch of clients try updating the same row. I
see sevaral postgresql processes marked "UPDATE waiting"

2) Because the client processes must complete rather quickly, a supervisor
process kills them if they don't complete soon.

3) The postgresql processes in an "UPDATE waiting" state seem to be
blocked waiting for a semaphore, and they do not notice the closing of
the connection when the client is killed.

4) New client processes are spawned and eventually try to update the row.

As a result, we end up with more and more postgresql processes until the
connection limit is reached, because processes in the "UPDATE waiting"
state don't notice the connection has been broken.

Questions:

1) Am I on the right track? Is this how PostgreSQL works?

2) I plan on solving it by making a low-contention table like this:

create table low_contention_stats (
date DATE DEFAULT current_date,
key TEXT,
randomizer INTEGER,
value INTEGER DEFAULT 0,
summarized BOOLEAN DEFAULT FALSE,
UNIQUE(date, key, randomizer, summarized)
);

Each client will pick a random "randomizer" value, so the odds of two
clients trying to update the same row at the same time are low. Once
a day, I summarize the low_contention_stats table and put the summary
in statistics.

A statistics update looks like this:

UPDATE low_contention_stats SET value = value + 1
WHERE date = current_date AND key = 'KEY'
AND randomizer = my_random_val AND not summarized;
(If the update updates 0 rows, we do an INSERT)

The summary process when we copy to statistics looks like this:

# "Freeze" existing rows -- any updates during the summary process
# will have to insert new rows where summarized is false
UPDATE low_contention_stats SET summarized = 't';

# Summarize stats
SELECT date, key, sum(value) FROM low_contention_stats WHERE summarized
GROUP BY date, key;

# Then we iterate over the results, updating "statistics"
DELETE FROM low_contention_stats WHERE summarized;

Will this help?

I can't easily test it, because I only see the problem under high load,
and the only high-load environment I have access to is a production one. :-(

Regards,

David.

Responses

Browse pgsql-admin by date

  From Date Subject
Next Message postgresql 2003-09-04 12:59:47 Advice on multi-machine high-reliability setup?
Previous Message Mark 2003-09-04 12:09:29 Upgrade