Re: Moving forward with TDE [PATCH v3]

From: Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: David Christensen <david(dot)christensen(at)crunchydata(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Aleksander Alekseev <aleksander(at)timescale(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Stephen Frost <sfrost(at)snowman(dot)net>
Subject: Re: Moving forward with TDE [PATCH v3]
Date: 2023-11-06 10:26:44
Message-ID: CAEze2WgwzUUM_Af+KmS=0sJmCLGz-m0Nr1NBP8oLnFm2mdHV0w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, 4 Nov 2023 at 03:38, Andres Freund <andres(at)anarazel(dot)de> wrote:
>
> Hi,
>
> On 2023-11-02 22:09:40 +0100, Matthias van de Meent wrote:
> > I'm quite surprised at the significant number of changes being made
> > outside the core storage manager files. I thought that changing out
> > mdsmgr with an encrypted smgr (that could wrap mdsmgr if so desired)
> > would be the most obvious change to implement cluster-wide encryption
> > with the least code touched, as relations don't need to know whether
> > the files they're writing are encrypted, right? Is there a reason to
> > not implement this at the smgr level that I overlooked in the
> > documentation of these patches?
>
> You can't really implement encryption transparently inside an smgr without
> significant downsides. You need a way to store an initialization vector
> associated with the page (or you can store that elsewhere, but then you've
> doubled the worst cse amount of random reads/writes). The patch uses the LSN
> as the IV (which I doubt is a good idea). For authenticated encryption further
> additional storage space is required.

I am unaware of any user of the smgr API that doesn't also use the
buffer cache, and thus implicitly the Page layout with PageHeader
[^1]. The API of smgr is also tailored to page-sized quanta of data
with mostly relation-level information. I don't see why there would be
a veil covering the layout of Page for smgr when all other information
already points to the use of PageHeader and Page layouts. In my view,
it would even make sense to allow the smgr to get exclusive access to
some part of the page in the current Page layout.

Yes, I agree that there will be an impact on usable page size if you
want authenticated encryption, and that AMs will indeed need to
account for storage space now being used by the smgr - inconvenient,
but it serves a purpose. That would happen regardless of whether smgr
or some higher system decides where to store the data for encryption -
as long as it is on the page, the AM effectively can't use those
bytes.
But I'd say that's best solved by making the Page documentation and
PageInit API explicit about the potential use of that space by the
chosen storage method (encrypted, plain, ...) instead of requiring the
various AMs to manually consider encryption when using Postgres' APIs
for writing data to disk without hitting shared buffers; page space
management is already a task of AMs, but handling the actual
encryption is not.

Should the AM really care whether the data on disk is encrypted or
not? I don't think so. When the disk contains encrypted bytes, but
smgrread() and smgrwrite() both produce and accept plaintext data,
who's going to complain? Requiring AMs to be mindful about encryption
on all common paths only adds pitfalls where encryption would be
forgotten by the developer of AMs in one path or another.

> To be able to to use the LSN as the IV, the patch needs to ensure that the LSN
> increases in additional situations (a more aggressive version of
> wal_log_hint_bits) - which can't be done below smgr, where we don't know about
> what WAL logging was done. Nor can you easily just add space on the page below
> md.c, for the purpose of storing an LSN independent IV and the authentication
> data.

I think that getting PageInit to allocate the smgr-specific area would
take some effort, too (which would potentially require adding some
relational context to PageInit, so that it knows which page of which
relation it is going to initialize), but IMHO that would be more
natural than requiring all index and table AMs to be aware the actual
encryption of its pages and require manual handling of that encryption
when the page needs to be written to disk, when it otherwise already
conforms to the various buffer management and file extension APIs
currently in use in PostgreSQL. I would expect "transparent" data
encryption to be handled at the file write layer (i.e. smgr), not
inside the AMs.

Kind regards,

Matthias van de Meent

[^1] ReadBuffer_common uses PageIsVerifiedExtended which verifies that
a page conforms with Postgres' Page layout if checksums are enabled.
Furthermore, all builtin index AMs utilize pd_special, further
implying the use of a PageInit/PageHeader-based page layout.
Additionally, the heap tableAM also complies, and both FSM and VM also
use postgres' Page layout.
As for other AMs that I could check: bloom, rum, and pgvector's
ivfflat and hnsw all use page layouts.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Aleksander Alekseev 2023-11-06 10:31:10 Re: Introduction and Inquiry on Beginner-Friendly Issues
Previous Message Aleksander Alekseev 2023-11-06 10:17:12 Re: [PATCH] Small refactoring of inval.c and inval.h