Re: Help estimating database and WAL size

From: "Daniel Serodio (lists)" <daniel(dot)lists(at)mandic(dot)com(dot)br>
To: Jasen Betts <jasen(at)xnet(dot)co(dot)nz>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Help estimating database and WAL size
Date: 2012-10-15 20:59:51
Message-ID: 507C7947.1090803@mandic.com.br
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Jasen Betts wrote:
> On 2012-10-08, Daniel Serodio (lists)<daniel(dot)lists(at)mandic(dot)com(dot)br> wrote:
>> We are preparing a PostgreSQL database for production usage and we need
>> to estimate the storage size for this database. We're a team of
>> developers with low expertise on database administration, so we are
>> doing research, reading manuals and using our general IT knowledge to
>> achieve this.
>>
>> We have actual data to migrate to this database and some rough
>> estimations of growth. For the sake of the example, let's say we have a
>> estimation of growth of 50% per year.
>>
>> The point is: what's the general proper technique for doing a good size
>> estimation?
>>
>> We are estimating the storage usage by the following rules. Topics where
>> we need advice are marked with ** asterisks **. Feedback on the whole
>> process is more than welcome.
>>
>> 1) Estimate the size of each table
>> 1.1) Discover the actual size of each row.
>> - For fields with a fixed size (like bigint, char, etc) we used
>> the sizes described in the documentation
>> - For fields with a dynamic size (like text) we estimated the
>> string length and used the function select pg_column_size('expected text
>> here'::text)
>
> long text is subject to compression, pg_column_size doesn't seem to
> test compression, compression is some sort of LZ..
Interesting, I didn't know about automatic compression. I've just read
the section on TOAST and haven't been able to answer this either: Is
there any way to check for the compressed size?
>> - We added 4 more bytes for the OID that PostgreSQL uses internally
>
> OID is optional, IIRC PGXID is not
I hadn't heard of PGXID, I've just searched Google but found no
reference to this term except for this e-mail thread and some source
code. What is PGXID? Where can I learn more about hit?
>> 1.2) Multiply the size of each row by the number of estimated rows
>> ** Do I need to consider any overhead here, like row or table
>> metadata? **
>
> page size 8K
> column overhead 1 byte per not-NULL column, NULLs are free,
>
>> 2) Estimate the size of each table index
>> ** Don't know how to estimate this, need advice here **
>
> IIRC
> ( data being indexed + 8 bytes ) / fill factor
>
>> 3) Estimate the size of the transaction log
>> ** We've got no idea how to estimate this, need advice **
>
> how big are your transactions?
Very short, a couple of statements each.
>> 4) Estimate the size of the backups (full and incremental)
>> ** Don't know how to estimate this, need advice here **
>
> depends on the format you use, backups tend to compress well.
>
>> 5) Sum all the estimates for the actual minimum size
>
> no, you get estimated size.
Thanks a lot for the response.

Regards,
Daniel Serodio
>> 6) Apply a factor of 1.5x (the 50% growth) to the sum of the estimates
>> 1, 2 and 4 for the minimum size after 1 year
>>
>> 7) Apply an overall factor of 1.2 ~ 1.4 (20% to 40% more) to estimates 5
>> and 6 for a good safety margin
>>
>> I know the rules got pretty extensive, please let me know if you need
>> more data or examples for a better understanding.
>>
>> We've also posted this question to
>> http://dba.stackexchange.com/q/25617/10166

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Daniel Serodio (lists) 2012-10-15 21:03:40 Re: Help estimating database and WAL size
Previous Message Magnus Hagander 2012-10-15 20:38:01 Re: Is pg_basebackup also for "regular" backups?