Re: "Shared strings"-style table

From: "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>
To: Seamus Abshere <seamus(at)abshere(dot)net>
Cc: "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Re: "Shared strings"-style table
Date: 2017-10-13 18:16:27
Message-ID: CAKFQuwYGqEVgsZtCozBw46zNQUXtfX-W-vVcxJ89Sjvk7JSXtA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Fri, Oct 13, 2017 at 9:29 AM, Seamus Abshere <seamus(at)abshere(dot)net> wrote:

> > On Fri, Oct 13, 2017 at 8:49 AM, Seamus Abshere wrote
> > > Theoretically / blue sky, could there be a table or column type that
> > > transparently handles "shared strings" like this, reducing size on disk
> > > at the cost of lookup overhead for all queries?
> > > (I guess maybe it's like TOAST, but content-hashed and de-duped and not
> > > only for large objects?)
>
> On Fri, Oct 13, 2017, at 01:12 PM, David G. Johnston wrote:
> > Row-independence is baked into PostgreSQL pretty deeply...
>
> Could you say more about that?
>

​Not intelligibly...basically as far as PostgreSQL is concerned all the
data to reconstruct a row from a given table is present in that table.
From a practical perspective the "TOAST table" for a table IS part of the
main table since it has no practical independent use.

As an aside I was thinking along the lines of an actual compression routine
which is what a spreadsheet file is able to do since a spreadsheet contains
the data from every row and column in a single file and is able to compress
the entire file by finding commonalities across rows and columns. A
database generally cannot do that.

As for "transparent lookup tables for text columns"...I suppose one could
implement a "system-managed-enum" type with many of the same properties of
an actual enum but avoiding many of its problems by not exposing the
enum-ness to the user and instead just exposing the text labels...I suspect
faced with prospect of doing something that complex most users would just
setup a FK relationship.

> What about the comparison to TOAST, which stores values off-table?
>

TOAST solves a technical problem related to the fact that records "on the
table" have a very small size limitation (kb) while stored values can be at
least as large as a GB. TOAST does involved compression but the input to
the compression algorithm is a single cell (row and column) in a table.​
As noted above I consider the TOAST table and main table to be a single
logical table.

Like I said the enum type has similar properties to what you want - but
Melvin is right that using it requires careful consideration of how your
data might change in the future.

David J.

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message David G. Johnston 2017-10-13 18:21:22 Re: Permissions for Web App
Previous Message Igal @ Lucee.org 2017-10-13 18:03:52 Re: Permissions for Web App