Re: A space-efficient, user-friendly way to store categorical data

From: Mark Dilger <hornschnorter(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Andrew Kane <andrew(at)chartkick(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: A space-efficient, user-friendly way to store categorical data
Date: 2018-02-13 03:11:25
Message-ID: CB54BD86-F2AB-416E-B13A-E0552C029E19@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


> On Feb 12, 2018, at 6:35 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> Andrew Kane <andrew(at)chartkick(dot)com> writes:
>> Thanks everyone for the feedback. The current enum implementation requires
>> you to create a new type and add labels outside a transaction prior to an
>> insert.
>
> Right ...
>
>> Since enums have a fixed number of labels, this type of feature may be
>> better off as a property you could add to text columns (as Thomas
>> mentions). This would avoid issues with hitting the max number of labels.
>
> ... but you're not saying how you'd avoid the need for prior commit of the
> labels. The sticking point for enums is that once a value has gotten into
> a btree index, we can't ever lose the ability to compare that value to
> others, or the index will be broken. So inserting an uncommitted value
> into user tables has to be prevented.
>
> Maybe there's a way to assign the labels so that they can be compared
> without reference to any outside data, but it's not clear to me how
> that would work.

When I implemented this, I wrote the comparators to work on the Oid for
the value, not the string representation. That works fine. If you want to
sort the data on the stringified version, cast to text first. That works well
enough for me, since I'm typically not interested in what sort order is used,
as long as it is deterministic and works for indexing, group by, and so forth.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2018-02-13 03:22:13 Re: Disabling src/test/[ssl|ldap] when not building with SSL/LDAP support
Previous Message Mark Dilger 2018-02-13 03:01:33 Re: A space-efficient, user-friendly way to store categorical data