Quick Links

Re: "Shared strings"-style table

From:	Rob Sargent <robjsargent(at)gmail(dot)com>
To:	pgsql-general(at)postgresql(dot)org
Subject:	Re: "Shared strings"-style table
Date:	2017-10-13 16:09:05
Message-ID:	dd0a0ae0-c28c-92d7-0650-559a4a43dd7b@gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general

On 10/13/2017 09:49 AM, Seamus Abshere wrote:
> hey,
>
> In the spreadsheet world, there is this concept of "shared strings," a
> simple way of compressing spreadsheets when the data is duplicated in
> many cells.
>
> In my database, I have a table with >200 million rows and >300 columns
> (all the households in the United States). For clarity of development
> and debugging, I have not made any effort to normalize its contents, so
> millions of rows have, for example, "SINGLE FAMILY RESIDENCE /
> TOWNHOUSE" (yes, that whole string!) instead of some code representing
> it.
>
> Theoretically / blue sky, could there be a table or column type that
> transparently handles "shared strings" like this, reducing size on disk
> at the cost of lookup overhead for all queries?
>
> (I guess maybe it's like TOAST, but content-hashed and de-duped and not
> only for large objects?)
>
> Thanks,
> Seamus
>
> --
> Seamus Abshere, SCEA
> https://www.faraday.io
> https://github.com/seamusabshere
> https://linkedin.com/in/seamusabshere
>
>
What data type are these columns now? I would be tempted to map the
full strings to an abbreviation just so I didn't have to alter all the
columns to an "id"; Optional to place any RI on the columns to the
abbreviation dictionary table. Just use the translation as a last step
in user facing reports. If you can map/abbreviate to 4 characters,
you've approximated the disk size of an integer.

In response to

"Shared strings"-style table at 2017-10-13 15:49:21 from Seamus Abshere

Browse pgsql-general by date

	From	Date	Subject
Next Message	David G. Johnston	2017-10-13 16:12:23	Re: "Shared strings"-style table
Previous Message	Seamus Abshere	2017-10-13 15:49:21	"Shared strings"-style table