INSERT times - same storage space but more fields -> much slower inserts

From: Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>
To: "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: INSERT times - same storage space but more fields -> much slower inserts
Date: 2009-04-14 07:54:40
Message-ID: 49E44140.1060907@postnewspapers.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hi

I've been doing some testing for the Bacula project, which uses
PostgreSQL as one of the databases in which it stores backup catalogs.

Insert times are critical in this environment, as the app may insert
millions of records a day.

I've been evaluating a schema change for Bacula that takes a field
that's currently stored as a gruesome-to-work-with base64-encoded
representation of a binary blob, and expands it into a set of integer
fields that can be searched, indexed, etc.

The table size of the expanded form is marginally smaller than for the
base64-encoded string version. However, INSERT times are *CONSIDERABLY*
greater for the version with more fields. It takes 1011 seconds to
insert the base64 version, vs 1290 seconds for the expanded-fields
version. That's a difference of 279 seconds, or 27%.

Despite that, the final table sizes are the same.

The SQL dump for the base64 version is 1734MB and the expanded one is
2189MB, about a 25% increase. Given that the final table sizes are the
same, is the slowdown likely to just be the cost of parsing the extra
SQL, converting the textual representations of the numbers, etc?

If I use tab-separated input and COPY, the original-format file is
1300MB and the expanded-structure format is 1618MB. The performance hit
on COPY-based insert is not as bad, at 161s vs 182s (13%), but still
quite significant.

Any ideas about what I might be able to do to improve the efficiency of
inserting records with many integer fields?

In case it's of interest, the base64 and expanded schema are:

CREATE TABLE file (
fileid bigint NOT NULL,
fileindex integer DEFAULT 0 NOT NULL,
jobid integer NOT NULL,
pathid integer NOT NULL,
filenameid integer NOT NULL,
markid integer DEFAULT 0 NOT NULL,
lstat text NOT NULL,
md5 text NOT NULL
);

CREATE TABLE file (
fileid bigint,
fileindex integer,
jobid integer,
pathid integer,
filenameid integer,
markid integer,
st_dev integer,
st_ino integer,
st_mod integer,
st_nlink integer,
st_uid integer,
st_gid integer,
st_rdev bigint,
st_size integer,
st_blksize integer,
st_blocks integer,
st_atime integer,
st_mtime integer,
st_ctime integer,
linkfi integer,
md5 text
);

( Yes, those are the fields of a `struct lstat' ).

--
Craig Ringer

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message PFC 2009-04-14 09:18:31 Re: difficulties with time based queries
Previous Message Grzegorz Jaśkiewicz 2009-04-13 18:47:18 Re: Postgres 8.x on Windows Server in production