From: | "Stephen Hutchings" <Stephen(dot)Hutchings(at)jbaconsulting(dot)co(dot)uk> |
---|---|
To: | <pgsql-general(at)postgresql(dot)org> |
Subject: | Checking for data changes across a very large table |
Date: | 2010-12-10 14:21:10 |
Message-ID: | 5503C113FD3EF745927F96031B470A0B39111B701B@mail-jba1.jbanorthwest.co.uk |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
I'd like some general guidance on a security issue please. This may belong in the another list so please push me there if appropriate.
We have an application design which includes a potential 2 billion row table (A). When the application user kicks off an analysis process, there is a requirement to perform a check on that table to verify that the data within hasn't changed.
The initial intent was to hold a single "checksum" or equivalent in another small table (B). This checksum value would have been calculated over a subset of row values held within the 2bn row table. When the user chooses to kick off their processing, the application would re-calculate this checksum for table A in real time and compare the new result with the original value stored in table B. We might obtain the subset from table A like this:
SELECT * FROM (
SELECT *,
(row_number() OVER( ORDER BY "ID"))%10000000 AS rn
FROM "TableA") aa
WHERE aa.rn=1 LIMIT 20000
... and then perform an operation on column data from that subset (in this example, a 100,000th of the original data) to calculate a checksum value. Obviously the data within the subset would have to have been affected in order for a difference to be detected.
It is a requirement that users have to have direct access to the DB (I hate this, but am overruled). I envisage that non-admin users would have read-only access, and the above validation would be an extra check on the data.
But my main concern is performance - I fear that this won't be able to perform (in a few seconds rather than minutes). Does the general approach make any sense? Can anyone suggest a better starting point?
This is Postgres 9.0. The minimum hardware spec is small and Windows-based (64-bit, Core i7 processor, 8GB RAM, single 1TB hard disk). My evaluation DB has ended up with the following config tweaks:
shared_buffers 2048 MB
work_mem 2047 MB
checkpoint_segments 128
Thanks in advance.
http://www.jbaconsulting.co.uk/
JBA Consulting
South Barn
Broughton Hall
Skipton
North Yorkshire
BD23 3AE
United Kingdom
http://www.jbaconsulting.co.uk/?q=nce-winner-2010
t: +44 (0)1756 799919 | f: +44 (0)1756 799449
JBA is a Carbon Neutral Company. Please don't print this e-mail unless you really need to.
This email is covered by JBA Consulting's http://www.jbaconsulting.co.uk/emaildisclaimer.
From | Date | Subject | |
---|---|---|---|
Next Message | Scott Marlowe | 2010-12-10 14:24:23 | Re: Performance tuning in Pgsql |
Previous Message | tv | 2010-12-10 13:35:31 | Re: Tuning Postgres for single user manipulating large amounts of data |