From: | Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> |
---|---|
To: | Bruce Momjian <bruce(at)momjian(dot)us> |
Cc: | Joe Conway <mail(at)joeconway(dot)com>, Antonin Houska <ah(at)cybertec(dot)at>, Stephen Frost <sfrost(at)snowman(dot)net>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>, "Moon, Insung" <Moon_Insung_i3(at)lab(dot)ntt(dot)co(dot)jp>, Ibrar Ahmed <ibrar(dot)ahmad(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS) |
Date: | 2019-08-06 15:31:58 |
Message-ID: | CAD21AoBC9xQLcbxS_Hxa220L7s8tpttXczkBdA5Ekz0V7MbXcw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi Bruce,
(off-list)
I think I'm missing something about basic of encryption. Please let me
question about it on off-list.
On Tue, Aug 6, 2019 at 11:36 PM Bruce Momjian <bruce(at)momjian(dot)us> wrote:
>
> On Tue, Aug 6, 2019 at 12:00:27PM +0900, Masahiko Sawada wrote:
> > What I'm thinking about WAL encryption is that WAL records on WAL
> > buffer is not encrypted. When writing to the disk we copy the contents
> > of 8k WAL page to a temporary buffer and encrypt it, and then write
> > it. And according to the current behavior, every time we write WAL we
> > write WAL per 8k WAL pages rather than WAL records.
> >
> > The nonce for WAL encryption is {segment number, counter}. Suppose we
> > write 100 bytes WAL at beginning of the first 8k WAL page in WAL
> > segment 50. We encrypt the entire 8k WAL page with the nonce starting
> > from {50, 0} and write to the disk. After that, suppose we append 200
> > bytes WAL to the same WAL page. We again encrypt the entire 8k WAL
> > page with the nonce staring from {50, 0} and write to the disk. The
> > two 8k WAL pages we wrote to the disk are different but we encrypted
> > them with the same nonce, which I think it's bad.
>
> OK, I think you are missing something. Let me go over the details.
> First, I think we are all agreed we are using CTR for heap/index pages,
> and for WAL, because CTR allows byte granularity, it is faster, and
> might be more secure.
>
> So, to write 8k heap/index pages, we use the agreed-on LSN/page-number
> to encrypt each page. In CTR mode, we do that by creating an 8k bit
> stream, which is created in 16-byte chunks with AES by incrementing the
> counter used for each 16-byte chunk. Wee then XOR the bits with what we
> want to encrypt, and skip the LSN and CRC parts of the page.
>
> For WAL, we effectively create a 16MB bitstream, though we can create it
> in parts as needed. (Creating it in parts is easier in CTR mode.) The
> nonce is the segment number, but each 16-byte chunk uses a different
> counter. Therefore, even if you are encrypting the same 8k page several
> times in the WAL, the 8k page would be different because of the LSN (and
> other changes), and the bitstream you encrypt/XOR it with would be
> different because the counter would be different for that offset in the
> WAL.
Well, so you mean that for example we encrypt only 100 bytes WAL
record when append 100 bytes WAL records?
For WAL encryption, if we encrypt the entire 8k WAL page and write the
entire page, the encrypted-and-written page will contain 100 bytes WAL
record data and (8192-100) bytes garbage (omitted WAL page header for
simplify), although WAL data on WAL buffer is still not encrypted
state. And then if we append 200 bytes again, the
encrypted-and-written page will contain 300 bytes WAL record data and
(8192-300)bytes garbage, data on WAL buffer is still not encrypted
state though.
In this case I think the first 100 bytes of two 8k WAL pages are the
same because we encrypted both from the beginning of the page with the
counter = 0. But the next 200 bytes are different; it's (encrypted)
garbage in the former case but it's (encrypted) WAL record data in the
latter case. I think that's a problem.
On the other hand, if we encrypt 8k WAL page with the different
counter of nonce after append 200 byes WAL record, the first 100 byte
(and of course the entire 8k page also) will be different. However
since it's the same thing doing as changing already-flushed WAL record
on the disk it's bad.
Also, if we encrypt only append data instead of entire 8k page, we
would need to have the information in somewhere about how much byte
the WAL page has valid values. Otherwise reading WAL would not work
fine.
Please advise me what I am missing.
Regards,
--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center
From | Date | Subject | |
---|---|---|---|
Next Message | Stephen Frost | 2019-08-06 16:07:06 | Re: More issues with pg_verify_checksums and checksum verification in base backups |
Previous Message | Anastasia Lubennikova | 2019-08-06 15:30:08 | Re: Use PageIndexTupleOverwrite() within nbtsort.c |