Re: On using doubles as primary keys

From: John McKown <john(dot)archie(dot)mckown(at)gmail(dot)com>
To: Kynn Jones <kynnjo(at)gmail(dot)com>
Cc: pgsql-general General <pgsql-general(at)postgresql(dot)org>
Subject: Re: On using doubles as primary keys
Date: 2015-04-17 19:57:35
Message-ID: CAAJSdjgKJy1O+4qCB2HT3AK7E-GjftZ1g119oZCtQdO8mPhs5Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Fri, Apr 17, 2015 at 10:34 AM, Kynn Jones <kynnjo(at)gmail(dot)com> wrote:

> I have some data in the form of a matrix of doubles (~2 million
> rows, ~400 columns) that I would like to store in a Pg table,
> along with the associated table of metadata (same number of rows,
> ~30 columns, almost all text). This is large enough to make
> working with it from flat files unwieldy.
>
> (The only reason for wanting to transfer this data to a Pg table
> is the hope that it will be easier to work with it by using SQL
> queries to extract subsets of it, than by, e.g., writing programs
> that must scan the entire matrix every time they're run.)
>
> My plan is to add one extra column to the Pg tables to serve as
> the primary key joining the data and the metadata tables
> together.
>
>
​I could be wrong, but it seems to me that the records in the two files
have a one to one correspondence with each other. That is, record #1 in
file #1 has data related to record #1 in file #2. And that there should
therefore always be the same number of records in both files. So my first
thought is why not just have one table in PostgreSQL which contains the
data merged from the corresponding records in the two files? IMO, this is a
cleaner data design than trying to ensure that two tables are kept "in
sync" as records are added and deleted. Of course, this could possible hurt
performance, due to reading a lot of data, if you consistently only need
data from one file and only occasionally read the other.​

If you insist on two for some reason, then I would have the load program
read a record from each file, incrementing a counter, and INSERT the data
from the files into the tables using the counter as the primary key. To
ease things a bit, assuming the 1:1, I would have the PK on the more active
file be a foreign key on the other file, with an ON CASCADE DELETE so that
when you DELETE records from the more active, the corresponding record in
the secondary table will automatically be deleted. You might even want some
sort of CONTRAINT trigger to do a DELETE on the primary table if a record
is DELETEd from the secondary.

Also, if you really want to separate the data into more than one table,
then why only two? In this case, which I admit that I personally _dislike_,
you might want to do an analysis of what columns you access together
frequently and put those columns into separate tables. Again, the main plus
of this would be memory usage and I/O on the data base server side. May Joe
Celko forgive me for introducing physical DB configuration into a data
structure discussion.

​<snip>​

>
> Thanks in advance!
>
> kj
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>

--
If you sent twitter messages while exploring, are you on a textpedition?

He's about as useful as a wax frying pan.

10 to the 12th power microphones = 1 Megaphone

Maranatha! <><
John McKown

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Paul A Jungwirth 2015-04-17 20:00:10 Re: On using doubles as primary keys
Previous Message Jim Nasby 2015-04-17 19:11:05 Re: How to keep pg_largeobject from growing endlessly