From: | Josh Berkus <josh(at)agliodbs(dot)com> |
---|---|
To: | gry(at)ll(dot)mit(dot)edu, pgsql-performance(at)postgresql(dot)org |
Subject: | Re: 4G row table? |
Date: | 2002-12-19 19:15:20 |
Message-ID: | 200212191115.20391.josh@agliodbs.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
George,
> [linux, 700MHz athlon, 512MB RAM, 700GB 10kRPM SCSI HW RAID, postgresql 7.2]
What kind of RAID? How many drives? Will you be updating the data
frequently, or mostly just running reports on it?
With 4G rows, you will have *heavy* disk access, so the configuration and
quality of your disk array is a big concern. You also might think about
upping th ememory if you can.
> We're setting up a DB of IC test data, which is very simple and regular, but
large.
> One project (we get three or four per year) has ~4 giga bits, each specified
by
> a few small integer values, e.g.:
> Name Type Values
> ----------------------
> wafer int 1-50
> die_row int 2-7
> die_col int 2-7
> testtype string (~10 different short strings)
> vdd int 0-25
> bit_col int 0-127
> bit_row int 0-511
> value bit 0 or 1
>
> with 4 billion(4e9) rows. I would guess to make wafer, die_row, etc. be of
> type "char", probably testtype a char too with a separate testtype lookup
table.
> Even so, this will be a huge table.
1. Use INT2 and not INT for the INT values above. If you can hire a
PostgreSQL hacker, have them design a new data type for you, an unsigned INT1
which will cut your storage space even further.
2. Do not use CHAR for wafer & die-row. CHAR requries min 3bytes storage;
INT2 is only 2 bytes.
3. If you can use a lookup table for testtype, make it another INT2 and create
a numeric key for the lookup table.
> Questions: How much overhead will there be in the table in addition to the
> 9 bytes of data I see?
There's more than 9 bytes in the above. Count again.
> How big will the primary index on the first seven columns
> be? Will this schema work at all?
As large as the 7 columns themselves, plus a little more. I suggest creating
a surrogate key as an int8 sequence to refer to most rows.
> Of course, we could pack 128 bits into an 8 byte "text" field (or should we
use bit(128)?),
> but lose some ease of use, especially for naive (but important) users.
This is also unlikely to be more efficient due to the translation<->conversion
process requried to access the data when you query.
> Comments, suggestions?
Unless you have a *really* good RAID array, expect slow performance on this
hardware platform.
--
-Josh Berkus
Aglio Database Solutions
San Francisco
From | Date | Subject | |
---|---|---|---|
Next Message | Charles H. Woloszynski | 2002-12-19 19:27:25 | Re: 4G row table? |
Previous Message | george young | 2002-12-19 19:10:58 | 4G row table? |