From: | Zoolin Lin <zoolin3g(at)yahoo(dot)com> |
---|---|
To: | pgsql-performance(at)postgresql(dot)org |
Subject: | Re: Any advantage to integer vs stored date w. timestamp |
Date: | 2007-03-07 21:45:42 |
Message-ID: | 20070307214542.10197.qmail@web63302.mail.re1.yahoo.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
Thank you for the reply
>> Primary table is all integers like:
>>
>> date id | num1 | num2 | num3 | num4 | num5 | num6 | num7 | num 8
>> -------------------------------------------------------------------------------------------------
>> primary key is on date to num->6 columns
>
>>> What types are num1->8?
>
> They are all integer
>>Hmm - not sure if you'd get any better packing if you could make some
>>int2 and put them next to each other. Need to test.
Thanks, I find virtually nothing on the int2 column type? beyond brief mention here
http://www.postgresql.org/docs/8.2/interactive/datatype-numeric.html#DATATYPE-INT
Could i prevail on you to expand on packing wtih int2 a bit more, or point me in the right direction for documentation?
If there's some way I can pack multipe columns into one to save space, yet still effectively query on them, even if it's a lot slower, that would be great.
My current scheme, though as normalized and summarized as I can make it, really chews up a ton of space. It might even be chewing up more than the data files i'm summarizing, I assume due to the indexing.
Regading saving disk space, I saw someone mention doing a custom build and changing
TOAST_TUPLE_THRESHOLD/TOAST_TUPLE_TARGET
So data is compressed sooner, it seems like that might be a viable option as well.
http://www.thescripts.com/forum/thread422854.html
> 2,500 rows per hour, with duplicate date columns, seems like it could
> add up though.
>>>Well, let's see 2500*24*365 = 21,900,000 * 4 bytes extra = 83MB
>>>additional storage over a year. Not sure it's worth worrying about.
Ahh yes probably better to make it a date w. timestamp column then.
Z
Richard Huxton <dev(at)archonet(dot)com> wrote: Zoolin Lin wrote:
> thanks for your reply
>
>> Primary table is all integers like:
>>
>> date id | num1 | num2 | num3 | num4 | num5 | num6 | num7 | num 8
>> -------------------------------------------------------------------------------------------------
>> primary key is on date to num->6 columns
>
>>> What types are num1->8?
>
> They are all integer
Hmm - not sure if you'd get any better packing if you could make some
int2 and put them next to each other. Need to test.
>> date_id | date w timestamp ----------------------------------------
>> 1 | 2007-2-15 Midnight 2 | 2007-2-15 1 am 3 |
>> 2007-2-15 2 am etc for 24 hours each day
>
>>> If you only want things accurate to an hour, you could lost the
>>> join and just store it as an int: 2007021500, 2007021501 etc.
>
> Hmm yeh I could, I think with the amount of data in the db though it
> behooves me to use one of the date types, even if via lookup table.
You can always create it as a custom ZLDate type. All it really needs to
be is an int with a few casts.
> So I guess I'm just not sure if I'm really gaining anything by using
> an integer date id column and doing a join on a date lookup table,
> vs just making it a date w. timestamp column and having duplicate
> dates in that column.
>
> I would imagine internally that the date w. timestamp is stored as
> perhaps a time_t type plus some timezone information. I don't know
> if it takes that much more space, or there's a significant
> performance penalty in using it
It's a double or int64 I believe, so allow 8 bytes instead of 4 for your
int.
> 2,500 rows per hour, with duplicate date columns, seems like it could
> add up though.
Well, let's see 2500*24*365 = 21,900,000 * 4 bytes extra = 83MB
additional storage over a year. Not sure it's worth worrying about.
--
Richard Huxton
Archonet Ltd
---------------------------------
No need to miss a message. Get email on-the-go
with Yahoo! Mail for Mobile. Get started.
From | Date | Subject | |
---|---|---|---|
Next Message | Shane Ambler | 2007-03-08 02:49:33 | Re: Any advantage to integer vs stored date w. timestamp |
Previous Message | Stefan Kaltenbrunner | 2007-03-07 18:48:44 | Re: [kris@obsecurity.org: Progress on scaling of FreeBSD on 8 CPU systems] |