From: | "Joel Jacobson" <joel(at)compiler(dot)org> |
---|---|
To: | pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Thoughts on NBASE=100000000 |
Date: | 2024-07-07 20:39:46 |
Message-ID: | 041fd562-e279-4457-a118-9fda593c9be9@app.fastmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hello hackers,
I'm not hopeful this idea will be fruitful, but maybe we can find solutions
to the problems together.
The idea is to increase the numeric NBASE from 1e4 to 1e8, which could possibly
give a significant performance boost of all operations across the board,
on 64-bit architectures, for many inputs.
Last time numeric's base was changed was back in 2003, when d72f6c75038 changed
it from 10 to 10000. Back then, 32-bit architectures were still dominant,
so base-10000 was clearly the best choice at this time.
Today, since 64-bit architectures are dominant, NBASE=1e8 seems like it would
have been the best choice, since the square of that still fits in
a 64-bit signed int.
Changing NBASE might seem impossible at first, due to the existing numeric data
on disk, and incompatibility issues when numeric data is transferred on the
wire.
Here are some ideas on how to work around some of these:
- Incrementally changing the data on disk, e.g. upon UPDATE/INSERT
and supporting both NBASE=1e4 (16-bit) and NBASE=1e8 (32-bit)
when reading data.
- Due to the lack of a version field in the NumericVar struct,
we need a way to detect if a Numeric value on disk uses
the existing NBASE=1e4, or NBASE=1e8.
One hack I've thought about is to exploit the fact that NUMERIC_NBYTES,
defined as:
#define NUMERIC_NBYTES(num) (VARSIZE(num) - NUMERIC_HEADER_SIZE(num))
will always be divisible by two, since a NumericDigit is an int16 (2 bytes).
The idea is then to let "NUMERIC_NBYTES divisible by three"
indicate NBASE=1e8, at the cost of one to three extra padding bytes.
Another important aspect is disk space utilization, which is of course better
for NBASE=1e4, since it packs the data more tightly.
I think this is the main disadvantage of NBASE=1e8, but perhaps users would be
willing to sacrifice some disk, if they would get better run-time performance.
As said initially, this might be completely unrealistic,
but interested to hear if anyone else have had similar dreams.
Regards,
Joel
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2024-07-07 20:43:56 | Re: XML test error on Arch Linux |
Previous Message | Joel Jacobson | 2024-07-07 19:46:20 | Optimize mul_var() for var1ndigits >= 8 |