Re: PostgreSQL design question

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Jeffrey Tenny <jeffrey(dot)tenny(at)comcast(dot)net>
Cc: postgres jdbc <pgsql-jdbc(at)postgresql(dot)org>
Subject: Re: PostgreSQL design question
Date: 2004-01-02 03:04:58
Message-ID: 2601.1073012698@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-jdbc

Jeffrey Tenny <jeffrey(dot)tenny(at)comcast(dot)net> writes:
> To determine the cache coherency of my java memory cache of database
> table content,
> I maintain an "update table" for all tables to be cached.
> Whenever a transaction updates a table that is cached, it must also
> update the 'update table'.

Hm. I do not actually see what it buys you to store n_rows in the
update table; seems like the update time is the only thing that is
useful. (With concurrent inserts under MVCC, n_rows is in the eye
of the beholder anyway, so if your caching code *is* relying on it
then it is wrong.)

However, getting rid of n_rows doesn't eliminate the fundamental problem
that rows of the update table are serialization bottlenecks.

Recognizing that update_time is all you can usefully propagate anyway,
consider this design:

* Get rid of primary key --- there are allowed to be multiple rows in
the update table for any given data table.

* Updating transactions insert (not update) a row identifying the table
they changed and giving their update_time.

* Caching transactions look at the max update_time for each table they
are interested in. This is cheap to do efficiently with the right query
and index --- see past discussions about optimizing max() operations.
(Hint: you do *not* write max().)

* Periodically, a maintenance process deletes all but the newest row for
each table, and then does a VACUUM of the update table.

This avoids serializing on any individual row of the update table.
The maintenance process doesn't block anything either (it's okay if
it happens to leave more than one valid row for a particular data
table on any given run).

However, this idea has a serious problem: it does not quite work because
transactions may not commit in the same order they start. Consider this
scenario:

Transaction A starts

A inserts some data

Transaction B starts

B inserts some data

B inserts its update_time

B commits

C inspects max(update_time)

C updates its cache

A inserts its update_time

A commits

C inspects max(update_time)

C updates its cache??

If update_time is the transaction's start time (now()), then C will in
fact fail to update its cache for A's updates, because it will still see
max(update_time) as being B's value. A's update_time is smaller than B's.

We can narrow the window for this failure by using current_timestamp
(wall clock time) taken as close as possible before the transaction
commits --- but I don't think we can eliminate the window entirely.

BTW, I think your original design is vulnerable to this same problem.
You may or may not be saved from it by the fact that you are serializing
on update-table rows; it would depend on the details of your coding.

I thought about using a sequence generator instead of now() to generate
the update_time values, but that doesn't seem to help. Does anyone
see a way to make this 100% reliable?

> I've avoided using any notifications from the server (I'm not sure JDBC
> even supports it, haven't tried),
> since I'm unsure such mechanisms are reliable.

LISTEN/NOTIFY would be more reliable than this, mainly because it
ensures that notifications are not delivered until after the sending
transaction commits (and therefore you cannot miss seeing its updates).
However, I'm not sure JDBC supports it either :-(. Anyone know?

> Furthermore, app servers in some hostile customer network environments
> have idle network connections dropped after some number of hours. If
> that happens, the app server will miss messages and have to
> potentially completely reinitialize its cache.

[ shrug... ] Use dummy keepalive transactions to prevent the connection
from looking idle for long periods. In any case, I can't believe that
infrequent cache reloads are a fatal objection.

> I'm also curious if anybody has developed a sequence generator for
> PostgreSQL that doesn't permit
> gaps in the sequence based on failed transactions.

It's easy to see that any such thing *must* be a serialization
bottleneck: you can't assign the next sequence ID till you see if
the guy who took the previous one commits. Otherwise you have gaps.

You can imagine mechanisms that will reissue the IDs taken by failed
transactions to the next requestors. However, this just means that you
have transient instead of permanent gaps in the ID sequence (since
later IDs might be used by xacts that commit before those that
successfully use reissued IDs). Most of the scenarios I can think of
for wanting non-gapping ID sequences will fail just as miserably with
transient gaps as with permanent ones.

In fact, if your criterion for "no gaps" is "no one who looks at the
current set of committed values ever sees a gap", then you really don't
have any choice: you have to ensure that the xact that took ID n
commits before the one that took ID n+1 commits; it's not enough they
commit, the commits have to occur in the right order. So you'd have to
have serialization at commit time even if you avoided serializing at the
ID-issuance point.

I would counsel taking a very hard look at why you think you need
non-gapping IDs, rather than engaging in wishful thinking about finding
an efficient way to have them.

regards, tom lane

In response to

Responses

Browse pgsql-jdbc by date

  From Date Subject
Next Message Andrew Dunstan 2004-01-02 06:09:12 Re: PL/Java issues
Previous Message Tom Lane 2004-01-02 02:12:17 Re: SELECT ... FOR UPDATE and ResultSet