Re: Varchar vs foreign key vs enumerator - table and index size

From: Łukasz Walkowski <lukasz(dot)walkowski(at)homplex(dot)pl>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Varchar vs foreign key vs enumerator - table and index size
Date: 2013-08-31 17:06:01
Message-ID: A698EB68-A22F-427A-9A7D-0C251B03A7EC@homplex.pl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Tom,

> If you're starting to be concerned about space, it's definitely time to
> get away from this choice. Depending on what locale you're using,
> comparing varchar values can be quite an expensive operation, too.

I don't like wasting space and processing power even if more work is required to achieve this. We use pl_PL.UTF-8 as our locale.

> I think the main "pro" of this approach is that it doesn't use any
> nonstandard SQL features, so you preserve your options to move to some
> other database in the future. The main "con" is that you'd be buying into
> fairly significant rewriting of your application code, since just about
> every query involving these columns would have to become a join.

Well, I don't really think I will move from Postgresql anytime soon. It's just the best database for me. Rewriting code is one of the things I'm doing right now but before I touch database, I want to be sure that the choices I made are good.

> FWIW, I'd be inclined to just use integer not smallint. The space savings
> from smallint is frequently illusory because of alignment considerations
> --- for instance, an index on a single smallint column will *not* be any
> smaller than one on a single int column. And smallint has some minor
> usage annoyances because it's a second-class citizen in the type promotion
> hierarchy --- you may find yourself needing explicit casts to smallint
> here and there.

Ok, thats important information. Thank you.

>
> Space-wise this is going to be equivalent to the integer-foreign-key
> solution. It's much nicer from a notational standpoint, though, because
> you don't need joins --- it's likely that you'd need few if any
> application code changes to go this route. (But I'd advise doing some
> testing to verify that before you take it as a given.)
>
> You're right though that enums are not a good option if you expect
> frequent changes in the pool of allowed values. I guess the question
> is how often does that happen, in your application? Adding a new value
> from time to time isn't much of a problem unless you want to get picky
> about how it sorts relative to existing values. But you can't ever delete
> an individual enum value, and we don't support renaming them either.
> (Though if you're desperate, I believe a manual UPDATE on the pg_enum
> catalog would work for that.)
>
> Another thing to think about is whether you have auxiliary data about each
> value that might usefully be stored as additional columns in the small
> tables. The enum approach doesn't directly handle that, though I suppose
> you could still create small separate tables that use an enum column as
> primary key.
>
> regards, tom lane

So, I'll go for enumerators for device type, eventtype and eventsource as those columns are quite stable. For browser and operating system I'll do external tables.

Thank you - any additional tips are welcome.

Reagards,
Lukasz Walkowski

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Craig James 2013-09-01 01:31:06 Re: Varchar vs foreign key vs enumerator - table and index size
Previous Message Tom Lane 2013-08-31 15:20:47 Re: Varchar vs foreign key vs enumerator - table and index size