Re: Why is my database so big?

From: Stephen Frost <sfrost(at)snowman(dot)net>
To: "FarjadFarid(ChkNet)" <farjad(dot)farid(at)checknetworks(dot)com>
Cc: 'Tom Lane' <tgl(at)sss(dot)pgh(dot)pa(dot)us>, 'Andrew Smith' <laconical(at)gmail(dot)com>, pgsql-general(at)postgresql(dot)org
Subject: Re: Why is my database so big?
Date: 2016-02-22 14:58:08
Message-ID: 20160222145808.GB13092@tamriel.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

All,

* FarjadFarid(ChkNet) (farjad(dot)farid(at)checknetworks(dot)com) wrote:
> Tom, thanks for your unbiased detailed response.
>
> Interesting post.

Please don't top-post. My comments are in-line, below.

> -----Original Message-----
> From: pgsql-general-owner(at)postgresql(dot)org
> [mailto:pgsql-general-owner(at)postgresql(dot)org] On Behalf Of Tom Lane
> Sent: 22 February 2016 05:06
> To: Andrew Smith
> Cc: pgsql-general(at)postgresql(dot)org
> Subject: Re: [GENERAL] Why is my database so big?
>
> Andrew Smith <laconical(at)gmail(dot)com> writes:
> > I am setting up a proof of concept database to store some historical data.
> > Whilst I've used PostgreSQL a bit in the past this is the first time
> > I've looked into disk usage due to the amount of data that could
> > potentially be stored. I've done a quick test and I'm a little
> > confused as to why it is occupying so much space on disk. Here is my table
> definition:
>
> > CREATE TABLE "TestSize"
> > (
> > "Id" integer NOT NULL,
> > "Time" timestamp without time zone NOT NULL,
> > "Value" real NOT NULL,
> > "Status" smallint NOT NULL,
> > PRIMARY KEY ("Id", "Time")
> > );
>
> > CREATE INDEX test_index ON "TestSize" ("Id");

Note that you don't really need an index on "Id" because including a
primary key will automatically include an index on those fields, and an
index over ("Id", "Time") can be used to satisfy queries which have a
conditional on just the "Id" column. Removing that extra index will
likely help with space issues.

> > With a completely empty table the database is 7 MB. After I insert 1
> > million records into the table the database is 121 MB. My
> > understanding is that each of the fields is sized as follows:
>
> > integer - 4 bytes
> > timestamp without time zone - 8 bytes
> > real - 4 bytes
> > smallint - 2 bytes

I'd recommend against using timestamp w/o time zone. For starters, as
noted, it's not actually saving you any space over timestamp w/ time
zone, and second, it makes working with that field painful and prone to
error.

> The long and the short of it is that Postgres is more oriented to OLTP-style
> applications where access to and update of individual rows is the key
> performance metric. If you're dealing with historical data and mainly want
> aggregated query results, it's possible you'd get better performance and
> more-compact storage from a column-store database.
>
> There's ongoing investigation into extending Postgres to support
> column-style storage for better support of applications like that; but any
> such feature is probably several years away, and it will not come without
> performance compromises of its own.

One approach to dealing with the PG per-row overhead is to use arrays
instead of rows, when you don't need that per-row visibility
information. I've found that to be very successful for single-column
tables, but the technique may work reasonably well for other structures
also.

Thanks!

Stephen

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Seamus Abshere 2016-02-22 15:18:41 Why does query planner choose slower BitmapAnd ?
Previous Message FarjadFarid(ChkNet) 2016-02-22 14:14:00 Re: Why is my database so big?