Re: Proposal: Deferred Replica Filtering for PostgreSQL Logical Replication

From: Dean <ds(dot)blue797(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Proposal: Deferred Replica Filtering for PostgreSQL Logical Replication
Date: 2025-03-19 00:56:04
Message-ID: CALWmXtuyvdL5zyYKgnszEVjX-Ru7jmGpKhn8zobXRbpoWRFFSg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Unfortunately, neither column lists nor row filters can provide the level
of control I'm proposing. These revised examples might help illustrate the
use case for DRF:

Alice, Bob, and Eve subscribe to changes on a `friend_requests` table.
Row-level security ensures CRUD access based on user IDs.
1. Per-subscriber column control: Bob makes a change to the table. Alice
should receive the entire record, while Eve should only receive the
timestamp - no other columns. Why DRF is needed: Column lists are static
and apply equally to all subscribers, meaning we can't distinguish Alice's
subscription from Eve's.
2. Bob DELETEs a row from the table. Alice should see the DELETE event,
while Eve should not even be aware of an event. Why DRF is needed: The
deterministic nature of row filters makes them unsuitable for
per-subscriber filtering based on session data.

The goal of DRF is to allow per-subscriber variations in change broadcasts,
enabling granular control over what data is sent to each subscriber based
on their session context.

Best,
Dean S

On Mon, Mar 17, 2025 at 4:32 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:

> On Sun, Mar 16, 2025 at 12:59 AM Dean <ds(dot)blue797(at)gmail(dot)com> wrote:
> >
> > I'd like to propose an enhancement to PostgreSQL's logical replication
> system: Deferred Replica Filtering (DRF). The goal of this feature is to
> provide more granular control over which rows are replicated by applying
> publication filters after the WAL decoding process, before sending data to
> subscribers.
> >
> > Currently, PostgreSQL's logical replication filters apply
> deterministically. Deferred filtering, however, operates after the WAL has
> been decoded, giving it access to the complete row data and making
> filtering decisions based on mutable values. Additionally, record columns
> may be omitted by the filter.
> >
> > This opens up several possibilities for granular control. Consider the
> following examples:
> > Alice and Bob subscribe to changes on a table with RLS enabled, allowing
> CRUD operations based on user's IDs.
> > 1. Alice needs to know the timestamp at which Bob updated the table.
> With DRF, we can omit all columns except for the timestamp.
> > 2. Bob wants to track DELETEs on the table. Without DRF, Bob can see all
> columns on any deleted row, potentially exposing complete records he
> shouldn't be authorized to view. DRF can filter these rows out.
> >
> > Deferred replica filtering allows for session-specific, per-row, and
> per-column filtering - features currently not supported by existing
> replication filters, enhancing security and data privacy.
> >
>
> We provide column lists [1] and row filters [2]. Doesn't that suffice
> the need, if not, kindly let us know what exactly you need with some
> examples.
>
> [1] -
> https://www.postgresql.org/docs/devel/logical-replication-col-lists.html
> [2] -
> https://www.postgresql.org/docs/devel/logical-replication-row-filter.html
>
> --
> With Regards,
> Amit Kapila.
>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Melanie Plageman 2025-03-19 01:00:17 Re: AIO v2.5
Previous Message Nathan Bossart 2025-03-19 00:41:17 Re: pgsql: aio: Infrastructure for io_method=worker