From: | Ron Johnson <ronljohnsonjr(at)gmail(dot)com> |
---|---|
To: | Adrian Garcia Badaracco <adrian(at)adriangb(dot)com> |
Cc: | pgsql-general(at)postgresql(dot)org |
Subject: | Re: Compressing large column by moving it to a unique table |
Date: | 2024-02-13 06:16:10 |
Message-ID: | CANzqJaBp4nVkAR+sXdQKN=8iTu6Pm4RTfTDPgPxHZ1S1JhMDgw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
On Mon, Feb 12, 2024 at 10:12 PM Adrian Garcia Badaracco <
adrian(at)adriangb(dot)com> wrote:
> I am using Timescale so I'll be mentioning some timestamp stuff but I
> think this is a general postgres question for the most part.
>
> I have a table with some fixed, small columns (id, timestamp, etc) and a
> large JSONB column (let's call it `attributes`). `attributes` has 1000s of
> schemas, but given a schema, there's a lot of duplication. Across all rows,
> more than 99% of the data is duplicated (as measured by `count(attributes)`
> vs `count(distinct attributes)`.
>
> I can't normalize `attributes` into real columns because it is quite
> variable (remember 1000s of schemas).
>
> My best idea is to make a table like `(day timestamptz, hash text,
> attributes jsonb)` and then in my original table replace `attributes` with
> a reference to `new_table`.
>
Meaning that there are many fewer rows in new_table?
> I can then make a view that joins them `select original_table.timestamp,
> new_table.attributes from original join new_table on (time_bucket('1 day',
> timestamp) = day AND original.hash = new_table.hash)` or something like
> that. The idea of time bucketing into 1 day is to balance write and read
> speed (by relying on timescale to do efficient time partitioning, data
> retention, etc.).
>
> I recognize this is essentially creating a key-value store in postgres and
> also janky compression, so I am cautious about it.
>
If my interpretation (that there are many fewer rows in new_table) is
correct, then you've stumbled into the Second Normal Form of database
design: https://en.wikipedia.org/wiki/Second_normal_form#Example
From | Date | Subject | |
---|---|---|---|
Next Message | Durumdara | 2024-02-13 07:56:26 | MAT. VIEW security problems and PG 10-11 versions? |
Previous Message | Adrian Garcia Badaracco | 2024-02-13 03:11:34 | Compressing large column by moving it to a unique table |