Re: Storage consumption

From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Troels Arvin <troels(at)arvin(dot)dk>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Storage consumption
Date: 2003-11-14 18:26:12
Message-ID: 200311141826.hAEIQCD23265@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general


Did you see the FAQ item on estimating disk space?

---------------------------------------------------------------------------

Troels Arvin wrote:
> Hello,
>
> For some very data-intensive projects it's interesting how much space the
> DBMS uses for the storage of data, so I'm investigating how space
> efficient different DBMSes are.
>
> In the PostgreSQL manual, it's written that values of the type INTEGER
> take op four bytes. I was curious about how close to real-World this
> number is, so I did a test: How much space does PostgreSQL use when
> storing 100000 rows where each row consists of a single INTEGER value?
>
> With help from http://random.org/ I created a file with 100000 random
> integer insertions. The SQL used to do that is available at
> http://troels.arvin.dk/db/tests/storage-sizes/randomints.zip
>
> About installation: PostgreSQL v. 7.3.4 on Red Hat Linux 9, file system
> ext3. PostgreSQL data-area in /var/lib/pgsql/data.
>
> For this test, PostgreSQL is being used for nothing else.
>
> Before test start:
> -----------------
> Access to a default database ('psql' brings you right into a working
> database) from psql.
> Access to do a 'du' (disk usage unix-command) on /var/lib/pgsql/data from
> the command line.
> No existing table 'inttab' in database. PostgreSQL stopped.
>
> Test starts.
> -----------
> Output of 'du -sb /var/lib/pgsql/data': 77946519.
> Start PostgreSQL.
> Do: "CREATE TABLE inttab (intval INT) WITHOUT OIDS;"
> psql -q -f random_ints.sql
> (Wait for a long time.)
> Do: "VACUUM FULL;"
> Shut down PostgreSQL.
> Output of 'du -sb /var/lib/pgsql/data': 81190551.
>
> Result:
> ------
> Real difference: 81190551-77946519 = 3244032
> Optimal difference: 100000*4 = 400000
> Storage consumption rate ((real/optimal)*100)% = 811%
>
> I'm surprised by an overhead _that_ high. Any comments on my methology?
> Does it need adjustments? If you think it's rotten: What methology would
> you use to measure space overhead for a DBMS? (Again: Space overhead is
> seldomly interesting, but sometimes it is.)
>
> I guess that transaction log files are a joker in this context, but then
> again: A number which reflects the DBMS' disk usage before and after an
> operation does have real-World meaning, I think.
>
> (Of course, I'll need another methology for DBMSes which preallocate a
> fixed amount of storage for a database.)
>
> --
> Greetings from Troels Arvin, Copenhagen, Denmark
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: Have you checked our extensive FAQ?
>
> http://www.postgresql.org/docs/faqs/FAQ.html
>

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Jeff 2003-11-14 18:36:26 Re: Updated Documentation
Previous Message Rajesh Kumar Mallah 2003-11-14 18:13:26 Re: DOMAIN usability