Re: Commit Timestamp and LSN Inversion issue

From: Aleksander Alekseev <aleksander(at)timescale(dot)com>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Cc: shveta malik <shveta(dot)malik(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>, tomas(at)vondra(dot)me
Subject: Re: Commit Timestamp and LSN Inversion issue
Date: 2024-09-04 08:34:51
Message-ID: CAJ7c6TNZY3idBDoGOaoS_fF6vJuBMjnDhZC4i5V0J58s2SJ3AA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Shveta,

> While discussing Logical Replication's Conflict Detection and
> Resolution (CDR) design in [1] , it came to our notice that the
> commit LSN and timestamp may not correlate perfectly i.e. commits may
> happen with LSN1 < LSN2 but with Ts1 > Ts2. This issue may arise
> because, during the commit process, the timestamp (xactStopTimestamp)
> is captured slightly earlier than when space is reserved in the WAL.
> [...]
> There was a suggestion in [3] to acquire the timestamp while reserving
> the space (because that happens in LSN order). The clock would need to
> be monotonic (easy enough with CLOCK_MONOTONIC), but also cheap. The
> main problem why it's being done outside the critical section, because
> gettimeofday() may be quite expensive. There's a concept of hybrid
> clock, combining "time" and logical counter, which might be useful
> independently of CDR.

I don't think you can rely on a system clock for conflict resolution.
In a corner case a DBA can move the clock forward or backward between
recordings of Ts1 and Ts2. On top of that there is no guarantee that
2+ servers have synchronised clocks. It seems to me that what you are
proposing will just hide the problem instead of solving it in the
general case.

This is the reason why {latest|earliest}_timestamp_wins strategies you
are proposing to use for CDR are poor strategies. In practice they
work as random_key_wins which is not extremely useful (what it does is
basically _deleting_ random data, not solving conflicts). On top of
that strategies like this don't take into account checks and
constraints the database may have, including high-level constraints
that may not be explicitly defined in the DBMS but may exist in the
application logic.

Unfortunately I can't reference a particular article or white paper on
the subject but I know that "last write wins" was widely criticized
back in the 2010s when people were building distributed systems on
commodity hardware. In this time period I worked on several projects
as a backend software engineer and I can assure you that LWW is not
something you want.

IMO the right approach to the problem would be defining procedures for
conflict resolution that may not only semi-randomly choose between two
tuples but also implement a user-defined logic. Similarly to INSERT
INTO ... ON CONFLICT ... semantics, or similar approaches from
long-lived and well-explored distributed system, e.g. Riak.
Alternatively / additionally we could support CRDTs in Postgres.

--
Best regards,
Aleksander Alekseev

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2024-09-04 08:40:06 Re: Virtual generated columns
Previous Message Kyotaro Horiguchi 2024-09-04 08:30:13 RecoveryTargetAction is left out in xlog_interna.h