From: | Eduardo Morras <emorras(at)s21sec(dot)com> |
---|---|
To: | "Carlo Stonebanks" <stonec(dot)register(at)sympatico(dot)ca> |
Cc: | pgsql-performance(at)postgresql(dot)org |
Subject: | Re: Massive table (500M rows) update nightmare |
Date: | 2010-01-08 11:19:38 |
Message-ID: | 20100108112431.46277558B18@s21sec.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
At 08:17 07/01/2010, Carlo Stonebanks wrote:
>Our DB has an audit table which is 500M rows and growing. (FYI the objects being audited are grouped semantically, not individual field values).
>
>Recently we wanted to add a new feature and we altered the table to add a new column. We are backfilling this varchar(255) column by writing a TCL script to page through the rows (where every update is a UPDATE ... WHERE id >= x AND id < x+10 and a commit is performed after every 1000 updates statement, i.e. every 10000 rows.)
>
>We have 10 columns, six of which are indexed. Rough calculations suggest that this will take two to three weeks to complete on an 8-core CPU with more than enough memory.
>
>As a ballpark estimate - is this sort of performance for an 500M updates what one would expect of PG given the table structure (detailed below) or should I dig deeper to look for performance issues?
>
>As always, thanks!
>
>Carlo
You can dump the table with pg_dumpand get a file like this (only a few columns, not all of them). Note that the last line, has the data in TSV (Tab Separate Format) plus a LF.
SET statement_timeout = 0;
SET client_encoding = 'UTF8';
SET standard_conforming_strings = off;
SET check_function_bodies = false;
SET client_min_messages = warning;
SET escape_string_warning = off;
SET search_path = public, pg_catalog;
SET default_tablespace = '';
SET default_with_oids = false;
CREATE TABLE mdx_core.audit_impt
(
audit_impt_id serial NOT NULL,
impt_session integer,
impt_version character varying(255),
}
COPY mdx_core.audit_impt (audit_impt_id, impt_session, impt_version) FROM stdin;
1 1 tateti
2 32 coll
You can add a new column hacking it, just adding the new column to the schema, the name in the copy statement and a tabulator+data at the end of the line (before the LF). Note that the table name is different from the original.
SET statement_timeout = 0;
SET client_encoding = 'UTF8';
SET standard_conforming_strings = off;
SET check_function_bodies = false;
SET client_min_messages = warning;
SET escape_string_warning = off;
SET search_path = public, pg_catalog;
SET default_tablespace = '';
SET default_with_oids = false;
CREATE TABLE mdx_core.audit_impt2
(
audit_impt_id serial NOT NULL,
impt_session integer,
impt_version character varying(255),
source_table character varying(255)
}
COPY mdx_core.audit_impt2 (audit_impt_id, impt_session, impt_version, source_table) FROM stdin;
1 1 tateti tentown
1 32 coll krof
After this, add indexes, constraints as usual.
HTH
--------------------------------
Eduardo Morrás González
Dept. I+D+i e-Crime Vigilancia Digital
S21sec Labs
Tlf: +34 902 222 521
Móvil: +34 555 555 555
www.s21sec.com, blog.s21sec.com
Salvo que se indique lo contrario, esta información es CONFIDENCIAL y
contiene datos de carácter personal que han de ser tratados conforme a la
legislación vigente en materia de protección de datos. Si usted no es
destinatario original de este mensaje, le comunicamos que no está autorizado
a revisar, reenviar, distribuir, copiar o imprimir la información en él
contenida y le rogamos que proceda a borrarlo de sus sistemas.
Kontrakoa adierazi ezean, posta elektroniko honen barruan doana ISILPEKO
informazioa da eta izaera pertsonaleko datuak dituenez, indarrean dagoen
datu pertsonalak babesteko legediaren arabera tratatu beharrekoa. Posta
honen hartzaile ez zaren kasuan, jakinarazten dizugu baimenik ez duzula
bertan dagoen informazioa aztertu, igorri, banatu, kopiatu edo inprimatzeko.
Hortaz, erregutzen dizugu posta hau zure sistemetatik berehala ezabatzea.
Antes de imprimir este mensaje valora si verdaderamente es necesario. De
esta forma contribuimos a la preservación del Medio Ambiente.
From | Date | Subject | |
---|---|---|---|
Next Message | Kevin Grittner | 2010-01-08 14:11:19 | Re: Massive table (500M rows) update nightmare |
Previous Message | Tore Halvorsen | 2010-01-08 11:03:55 | FusionIO performance |