A space-efficient, user-friendly way to store categorical data

From: Andrew Kane <andrew(at)chartkick(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: A space-efficient, user-friendly way to store categorical data
Date: 2018-02-11 03:46:41
Message-ID: CACDdp+b0=o_jsoLnmq=5eL3mmpcxxYH1AZoqg-yz9tSP1+rVyA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

I'm hoping to get feedback on an idea for a new data type to allow for
efficient storage of text values while keeping reads and writes
user-friendly. Suppose you want to store categorical data like current city
for users. There will be a long list of cities, and many users will have
the same city. Some options are:

- Use a text column
- Use an enum column - saves space, but labels must be set ahead of time
- Create another table for cities (normalize) - saves space, but
complicates reads and writes

A better option could be a new "dynamic enum" type, which would have
similar storage requirements as an enum, but instead of labels being
declared ahead of time, they would be added as data is inserted.

It'd be great to hear what others think of this (or if I'm missing
something). Another direction could be to deduplicate values for TOAST-able
data types.

Thanks,
Andrew

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Petr Jelinek 2018-02-11 04:20:41 Re: ALTER TABLE ADD COLUMN fast default
Previous Message Michael Paquier 2018-02-11 00:46:50 Re: ldapi support