Re: Reducing the overhead of NUMERIC data

From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Reducing the overhead of NUMERIC data
Date: 2005-11-01 22:46:07
Message-ID: 1130885167.8300.1702.camel@localhost.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-patches

On Tue, 2005-11-01 at 16:54 -0500, Tom Lane wrote:
> Simon Riggs <simon(at)2ndquadrant(dot)com> writes:
> > varlen is int32 to match the standard varlena header. However, the max
> > number of digits of the datatype is less than the threshold at which
> > values get toasted. So no NUMERIC values ever get toasted - in which
> > case, why worry about matching the size of varlena - lets reduce it to 2
> > bytes which still gives us up to 1000 digits as we have now.
>
> Because that will require an extra case in the code that disassembles
> tuples, which will slow down *everything* even in databases that don't
> contain one single NUMERIC value. I think you need more than "let's
> save 2 bytes in NUMERICs" to justify that.

OK

> > n_weight seems to exist because we do not store trailing zeroes. So
> > 1000000 is stored as 1 with a weight of 6. My experience is that large
> > numbers of trailing zeroes do not occur with any frequency in real
> > measurement or financial data and that this is an over-optimization.
>
> That seems debatable. Keep in mind that not storing extra zeroes cuts
> computation time as well as storage.

Check what % difference this makes. 2 bytes on everything makes more
difference than a 1 byte saving on a few percent of values.

> > n_sign_dscale shows us where the decimal point is. We could actually
> > store a marker representing the decimal point, which would cost us 0.5
> > byte rather than 2 bytes. Since we have 4 bits to represent a decimal
> > number, that leaves a few bits spare to represent either a decimal-
> > point-and-positive-sign and decimal-point-and-negative-sign. (We would
> > still need to store trailing zeroes even after the decimal point).
>
> This is completely bogus. How are you going to remember the sign except
> by always storing a marker? ISTM this proposal just moves the
> sign/decimalpoint overhead from one place to another, ie, somewhere in
> the NumericDigit array instead of in a fixed field.

That is exactly my proposal. Thus an overhead of 0.5 bytes rather than 2
bytes, as I explained....but

> Also, you can't drop dscale without abandoning the efficient base-10000
> representation, at least not unless you want people complaining that the
> database shows NUMERIC(3) data with four decimal places.

... I take it I have misunderstood the storage format.

> It might be reasonable to restrict the range of NUMERIC to the point
> that we could fit the weight/sign/dscale into 2 bytes instead of 4,
> thereby saving 2 bytes per NUMERIC. I'm not excited about the other
> aspects of this, though.

That seems easily doable - it seemed like something would stick.

Restricting total number of digits to 255 and maxscale of 254 would
allow that saving, yes?

We can then have a BIGNUMERIC which would allow up to 1000 digits for
anybody out there that ever got that high. I'm sure there's a few %, so
I won't dismiss you entirely, guys...

Best Regards, Simon Riggs

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Jim C. Nasby 2005-11-01 22:49:28 Re: Reducing the overhead of NUMERIC data
Previous Message Chris Browne 2005-11-01 22:42:37 Re: slru.c race condition

Browse pgsql-patches by date

  From Date Subject
Next Message Jim C. Nasby 2005-11-01 22:49:28 Re: Reducing the overhead of NUMERIC data
Previous Message Chris Browne 2005-11-01 22:42:37 Re: slru.c race condition