From: | Simon Riggs <simon(at)2ndquadrant(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Reducing the overhead of NUMERIC data |
Date: | 2005-11-01 22:46:07 |
Message-ID: | 1130885167.8300.1702.camel@localhost.localdomain |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers pgsql-patches |
On Tue, 2005-11-01 at 16:54 -0500, Tom Lane wrote:
> Simon Riggs <simon(at)2ndquadrant(dot)com> writes:
> > varlen is int32 to match the standard varlena header. However, the max
> > number of digits of the datatype is less than the threshold at which
> > values get toasted. So no NUMERIC values ever get toasted - in which
> > case, why worry about matching the size of varlena - lets reduce it to 2
> > bytes which still gives us up to 1000 digits as we have now.
>
> Because that will require an extra case in the code that disassembles
> tuples, which will slow down *everything* even in databases that don't
> contain one single NUMERIC value. I think you need more than "let's
> save 2 bytes in NUMERICs" to justify that.
OK
> > n_weight seems to exist because we do not store trailing zeroes. So
> > 1000000 is stored as 1 with a weight of 6. My experience is that large
> > numbers of trailing zeroes do not occur with any frequency in real
> > measurement or financial data and that this is an over-optimization.
>
> That seems debatable. Keep in mind that not storing extra zeroes cuts
> computation time as well as storage.
Check what % difference this makes. 2 bytes on everything makes more
difference than a 1 byte saving on a few percent of values.
> > n_sign_dscale shows us where the decimal point is. We could actually
> > store a marker representing the decimal point, which would cost us 0.5
> > byte rather than 2 bytes. Since we have 4 bits to represent a decimal
> > number, that leaves a few bits spare to represent either a decimal-
> > point-and-positive-sign and decimal-point-and-negative-sign. (We would
> > still need to store trailing zeroes even after the decimal point).
>
> This is completely bogus. How are you going to remember the sign except
> by always storing a marker? ISTM this proposal just moves the
> sign/decimalpoint overhead from one place to another, ie, somewhere in
> the NumericDigit array instead of in a fixed field.
That is exactly my proposal. Thus an overhead of 0.5 bytes rather than 2
bytes, as I explained....but
> Also, you can't drop dscale without abandoning the efficient base-10000
> representation, at least not unless you want people complaining that the
> database shows NUMERIC(3) data with four decimal places.
... I take it I have misunderstood the storage format.
> It might be reasonable to restrict the range of NUMERIC to the point
> that we could fit the weight/sign/dscale into 2 bytes instead of 4,
> thereby saving 2 bytes per NUMERIC. I'm not excited about the other
> aspects of this, though.
That seems easily doable - it seemed like something would stick.
Restricting total number of digits to 255 and maxscale of 254 would
allow that saving, yes?
We can then have a BIGNUMERIC which would allow up to 1000 digits for
anybody out there that ever got that high. I'm sure there's a few %, so
I won't dismiss you entirely, guys...
Best Regards, Simon Riggs
From | Date | Subject | |
---|---|---|---|
Next Message | Jim C. Nasby | 2005-11-01 22:49:28 | Re: Reducing the overhead of NUMERIC data |
Previous Message | Chris Browne | 2005-11-01 22:42:37 | Re: slru.c race condition |
From | Date | Subject | |
---|---|---|---|
Next Message | Jim C. Nasby | 2005-11-01 22:49:28 | Re: Reducing the overhead of NUMERIC data |
Previous Message | Chris Browne | 2005-11-01 22:42:37 | Re: slru.c race condition |