From: | Francisco Olarte <folarte(at)peoplecall(dot)com> |
---|---|
To: | Johannes Linke <johannes(dot)linke(at)posteo(dot)de> |
Cc: | Postgres General <pgsql-general(at)postgresql(dot)org> |
Subject: | Re: System column xmin makes anonymity hard |
Date: | 2020-05-12 19:11:40 |
Message-ID: | CA+bJJbw8OcvO-tY48BFKpejBWEE5gJdjLeRBVbgvaP27Z29Lag@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Johannes.
On Tue, May 12, 2020 at 8:05 PM Johannes Linke <johannes(dot)linke(at)posteo(dot)de> wrote:
> since 9.4, VACUUM FREEZE just sets a flag bit instead of overwriting xmin with FrozenTransactionId [1]. This makes it harder to build applications with a focus on data reduction.
> We have an app that lets people anonymously vote on stuff exactly once. So we save the vote in one table without any explicit connection to the voting user, and separate from that a flag that this person gave their vote. That has to happen in the same transaction for obvious reasons, but now the xmin of those two data points allows to connect them and to de-anonymize the vote.
> We can of course obfuscate this connection, but our goal is to not keep this data at all to make it impossible to de-anonymize all existing votes even when gaining access to the server. The best idea we had so far is more of a workaround: Do dummy updates to large parts of the vote table on every insert so lots of tuples have the same xmin, and them VACUUMing.[2]
And even without the xmin someone could cump ctid and correlate them
if you are not careful.
You problem is going to be hard to solve without taking extra steps. I
think doing a transaction which moves all the votes for period ( using
insert into with the result of a delete returning ) and then inserts
them back ( with some things like a insert into of a select order by
random ) may work ( you may even throw a shuffled flg along the way ).
An then throw in vacuum so next batch of inserts overwrites the freed
space.
But for someone with the appropiate access to the system, partial
deanonimization is possible unless you take very good measures. Think
of it, here in spain we use ballot boxes. But voter order is recorded
( they do double entry check, you get searched in an alphabetic list,
your name is copied on a time ordered list, and your position on the
list recorded in the alphabetic one, all in paper, nice system, easy
to audit, hard to cheat ). If you can freeze time, you can carefully
pick up votes from the box and partially correlate them with the list,
even with boxes much larger than the voting envelopes they tend to
stack with a nice order. And this is with papers, computers are much
better on purposelessly ordering everything because it is easier to do
it this way.
> Does anyone have a suggestion better than this? Is there any chance this changes anytime soon? Should I post this to -hackers?
Something which may be useful is to use a stagging table for newly
inserted votes and move them in batches, shuffling them, to a more
permanent one periodically, ad use a view to joing them. You can even
do that with some fancy partiotioning and an extra field. And move
some users already-voted flags too, on a different transaction. Doing
some of these things and adding some old votes to the moving sets
should make the things difficult to track, but it all depends on how
hard your anonimization requirements are ( I mean, the paper system
I've described leaves my vote perfectly identificable when I've just
voted, but it is regarded as a non issue in general, and I suspect any
system you can think leaves the last vote identifiable for a finite
amount of time ). In general, move data around, in single transactions
so you do not lose anything, like shaking a ballot box periodically (
but ensure the lid is properly taped first ).
Francisco Olarte.
From | Date | Subject | |
---|---|---|---|
Next Message | Matthias Apitz | 2020-05-12 19:14:29 | Re: ESQL/C: a ROLLBACK rolls back a COMMITED transaction |
Previous Message | Tory M Blue | 2020-05-12 18:16:44 | Re: Is there a significant difference in Memory settings between 9.5 and 12 |