From: | Andrew Kane <andrew(at)chartkick(dot)com> |
---|---|
To: | Mark Dilger <hornschnorter(at)gmail(dot)com> |
Cc: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andrew Kane <andrew(at)chartkick(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: A space-efficient, user-friendly way to store categorical data |
Date: | 2018-02-13 03:57:24 |
Message-ID: | CACDdp+YG_NP9isQ9Ey2KPb_sKoRH+MUj51kJzaGm6rTocKTUcg@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
They'd refer to separate enums.
I originally thought an enum was a good comparison for this feature, but
I'm no longer sure that it is. A text-based ordering would be desired
rather than the label index.
A better comparison may be a two-column lookup table:
-- create
CREATE TABLE cities (id bigserial primary key, name text)
CREATE UNIQUE INDEX ON cities (name);
CREATE TABLE users (city_id bigint);
-- insert
BEGIN;
INSERT INTO cities (name) VALUES ('Chicago') ON CONFLICT (name) DO NOTHING
RETURNING id;
INSERT INTO users (city_id) VALUES (<city id returned from earlier>);
COMMIT;
-- select
SELECT * FROM users FROM users INNER JOIN cities ON cities.id =
users.city_id WHERE name = 'Chicago';
Ideally, the lookup table could be maintained by Postgres to make reads and
writes easier.
-- create
CREATE TABLE users (city text DEDUPED);
-- insert
INSERT INTO users (city) VALUES ('Chicago');
-- query
SELECT * FROM users WHERE city = 'Chicago';
I'm not really sure the best place to store this lookup table.
- Andrew
On Mon, Feb 12, 2018 at 7:11 PM, Mark Dilger <hornschnorter(at)gmail(dot)com>
wrote:
>
> > On Feb 12, 2018, at 6:35 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> >
> > Andrew Kane <andrew(at)chartkick(dot)com> writes:
> >> Thanks everyone for the feedback. The current enum implementation
> requires
> >> you to create a new type and add labels outside a transaction prior to
> an
> >> insert.
> >
> > Right ...
> >
> >> Since enums have a fixed number of labels, this type of feature may be
> >> better off as a property you could add to text columns (as Thomas
> >> mentions). This would avoid issues with hitting the max number of
> labels.
> >
> > ... but you're not saying how you'd avoid the need for prior commit of
> the
> > labels. The sticking point for enums is that once a value has gotten
> into
> > a btree index, we can't ever lose the ability to compare that value to
> > others, or the index will be broken. So inserting an uncommitted value
> > into user tables has to be prevented.
> >
> > Maybe there's a way to assign the labels so that they can be compared
> > without reference to any outside data, but it's not clear to me how
> > that would work.
>
> When I implemented this, I wrote the comparators to work on the Oid for
> the value, not the string representation. That works fine. If you want to
> sort the data on the stringified version, cast to text first. That works
> well
> enough for me, since I'm typically not interested in what sort order is
> used,
> as long as it is deterministic and works for indexing, group by, and so
> forth.
>
>
From | Date | Subject | |
---|---|---|---|
Next Message | Tatsuo Ishii | 2018-02-13 03:58:22 | Re: rename sgml files? |
Previous Message | Michael Paquier | 2018-02-13 03:52:38 | Re: rename sgml files? |