From: | Olivier Macchioni <olivier(dot)macchioni(at)wingo(dot)ch> |
---|---|
To: | pgsql-general(at)lists(dot)postgresql(dot)org |
Subject: | Dynamic Enum? |
Date: | 2017-12-21 10:34:22 |
Message-ID: | 2139E532-8B9B-407A-8FC9-5060D24521CE@wingo.ch |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Hello,
First post on the list, please be indulgent :)
I'm working on a fairly large DB (for me) - 23 GB data, 42 GB indexes, 100 M rows for my biggest table - and trying to find possible optimisations on the storage requirements... and hopefully trying to keep more indexes in RAM.
This DB is actually the import of "events" on a sliding window of 30 days, and I have no control over the events which are generated. I have control of the import script though.
Many of the columns have the following characteristics:
- VARCHAR
- low cardinality (typically < 100 distinct values)
- but I can see new values appearing "at any time" when importing data from external systems. I don't expect the cardinality to grow significantly though.
The naive storage of those columns is quite demanding when compared to the amount of information they carry, and I'm looking at solutions to optimise this. Obviously I could:
- use ENUMs to reduce the storage space to 4 bytes on disk (cf https://www.postgresql.org/docs/current/static/datatype-enum.html <https://www.postgresql.org/docs/current/static/datatype-enum.html>) assuming I managed the ENUMs by adding new values when needed. This would probably shrink the indexes significantly as well. It may have an impact on the comparison of values as well.
- normalize the DB by adding another table and a FOREIGN KEY - the management of this table could be done via triggers for instance, with a cost in complexity (triggers, applications accessing the DB, ...)
I would find much more elegant to use a datatype where my VARCHARs would be be internally stored as a SMALLINT (or similar), indexed as SMALLINT, while still being able to be externally seen as if it was a VARCHAR (comparison, ORM bindings, ...)
I didn't find any datatype which would work like this :(
Does anyone know of such a solution ?
Thank you,
Olivier
From | Date | Subject | |
---|---|---|---|
Next Message | Hans Schou | 2017-12-21 10:37:05 | Deadlock with one table - PostgreSQL is doing it right |
Previous Message | Magnus Hagander | 2017-12-21 10:27:07 | Re: psycopg2 and java gssapi questions |