Syncing an application cache with xmin

From: Jason Dusek <jason(dot)dusek(at)gmail(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Syncing an application cache with xmin
Date: 2013-02-03 15:22:44
Message-ID: CAO3NbwO=69m9Y4e2jU-pKLWc42-+s0fJMRHgAZM23HL0mWmJ+Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hello List,

Imagine a table of records describing the up-to-date state of
some objects:

id | t | data columns ...
------+-------------+------------------
uuid | timestamptz | ...

The `id' column is a PRIMARY KEY. When an object is updated, the
old record is archived and a new record, with `t' set to
transaction_timestamp(), is added. So the table really contains
just the up-to-date state of objects.

An application would like to cache this up to date state,
synchronizing at regular intervals. The application might not
maintain a persistent connection to the database, so
LISTEN/NOTIFY is not to be preferred. What are some algorithms
by which one can retrieve new rows since the last sync?

Since the sync is made at regular intervals -- let's call it
once a minute -- one could query for all records with `t' that
is less than a minute old. For safety's sake, one can query for
*two* minutes of data.

This would seem to solve the problem. It works pretty well.
However, a new challenger appears: imagine an inserting
transaction that runs for five minutes. None of rows are visible
and then they are added -- with a transaction_timestamp that is
five minutes in the past! These rows are never synced.

Now you could argue that writes shouldn't occur in such long
transactions and you would be right; but it happens that one-off
tools exhibit bad behaviour one wouldn't accept in a production
application, and fixing the tools can be hard to make happen.

I was reading today about the `xmin' column, `txid_current()`
and `pg_export_snapshot()` and I wonder if there is not a better
way to sync, using transaction IDs instead of time.

http://www.postgresql.org/docs/9.2/static/ddl-system-columns.html
http://www.postgresql.org/docs/9.2/static/functions-info.html#FUNCTIONS-TXID-SNAPSHOT
http://www.postgresql.org/docs/9.2/static/functions-admin.html#FUNCTIONS-SNAPSHOT-SYNCHRONIZATION

The idea would be, to store information about the last XID in
the last sync and search for XIDs committed since then upon
reconnecting for sync. Perhaps `txid_current_snapshot()'
preserves enough information. Is this a plausible technique?
Would it be a misuse of XIDs?

--
Jason Dusek
pgp // solidsnack // C1EBC57DC55144F35460C8DF1FD4C6C1FED18A2B

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Tom Lane 2013-02-03 16:09:40 Re: Syncing an application cache with xmin
Previous Message Fabrízio de Royes Mello 2013-02-03 14:46:56 Re: Diferences between IN and EXISTS?