When to store data that could be derived

From: Frank <frank(at)chagford(dot)com>
To: pgsql-general(at)lists(dot)postgresql(dot)org
Subject: When to store data that could be derived
Date: 2019-03-24 06:42:09
Message-ID: a2b19f30-9e1d-aea5-7b33-427047522461@chagford.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hi all

As I understand it, a  general rule of thumb is that you should never
create a physical column if the data could be derived from existing
columns. A possible reason for breaking this rule is for performance
reasons.

I have a situation where I am considering breaking the rule, but I am
not experienced enough in SQL to know if my reason is valid. I would
appreciate it if someone could glance at my 'before' and 'after'
scenarios and see if, from a 'gut-feel' point of view, I should proceed.

I have a VIEW constructed as follows -

CREATE VIEW view_name AS
[select statement 1]
UNION ALL
[select statement 2]
etc.

This is one of the select statements. I will give the 'after' scenario
first -

SELECT
    'arec' AS tran_type, a.row_id AS tran_row_id,
    a.tran_number AS tran_number, a.cust_row_id AS cust_row_id,
    CASE
        WHEN a.tran_type = 'ar_rec' THEN y.tran_date
        WHEN a.tran_type = 'cb_rec' THEN w.tran_date
    END AS tran_date,
    CASE
        WHEN a.tran_type = 'ar_rec' THEN y.text
        WHEN a.tran_type = 'cb_rec' THEN w.text
    END AS text,
    0 - a.arec_cust AS amount_cust,
    0 - a.arec_local AS amount_local
FROM ar_rec_subtran a
LEFT JOIN ar_tran_rec_det z ON z.row_id = a.tran_det_row_id
LEFT JOIN ar_tran_rec y ON y.row_id = z.tran_row_id
LEFT JOIN cb_tran_rec_det x ON x.row_id = a.tran_det_row_id
LEFT JOIN cb_tran_rec w ON w.row_id = x.tran_row_id
WHERE
    CASE
        WHEN a.tran_type = 'ar_rec' THEN y.posted
        WHEN a.tran_type = 'cb_rec' THEN w.posted
    END = '1'

The two columns a.arec_cust and a.arec_local *can* be derived from other
columns, and in fact that is how it is working at the moment, so here is
the 'before' scenario -

SELECT
    'arec' AS tran_type, a.row_id AS tran_row_id,
    a.tran_number AS tran_number, a.cust_row_id AS cust_row_id,
    CASE
        WHEN a.tran_type = 'ar_rec' THEN y.tran_date
        WHEN a.tran_type = 'cb_rec' THEN w.tran_date
    END AS tran_date,
    CASE
        WHEN a.tran_type = 'ar_rec' THEN y.text
        WHEN a.tran_type = 'cb_rec' THEN w.text
    END AS text,
    ROUND(0 - (ROUND(a.arec_amount / CASE
        WHEN a.tran_type = 'ar_rec' THEN y.tran_exch_rate
        WHEN a.tran_type = 'cb_rec' THEN w.tran_exch_rate
    END * a.cust_exch_rate, u.scale)), 2) AS amount_cust,
    ROUND(0 - (ROUND(a.arec_amount / CASE
        WHEN a.tran_type = 'ar_rec' THEN y.tran_exch_rate
        WHEN a.tran_type = 'cb_rec' THEN w.tran_exch_rate
    END, s.scale)), 2) AS amount_local
FROM ar_rec_subtran a
LEFT JOIN ar_tran_rec_det z ON z.row_id = a.tran_det_row_id
LEFT JOIN ar_tran_rec y ON y.row_id = z.tran_row_id
LEFT JOIN cb_tran_rec_det x ON x.row_id = a.tran_det_row_id
LEFT JOIN cb_tran_rec w ON w.row_id = x.tran_row_id
LEFT JOIN ar_customers v ON v.row_id = a.cust_row_id
LEFT JOIN adm_currencies u ON u.row_id = v.currency_id
LEFT JOIN adm_params t ON t.row_id = 1
LEFT JOIN adm_currencies s ON s.row_id = t.local_curr_id
WHERE
    CASE
        WHEN a.tran_type = 'ar_rec' THEN y.posted
        WHEN a.tran_type = 'cb_rec' THEN w.posted
    END = '1'

As you can see, complexity has increased and there are four additional
JOINs.

I am expecting the VIEW to be used extensively for query purposes, and
my gut-feel says that the second one is likely to lead to performance
problems in a system with a lot of data and a lot of users.

I am not looking for an answer - I know that I should create dummy data
and run some timing tests. I was just wondering if someone more
experienced would wince when they look at the second SELECT, or if they
would shrug and think that it looks fine.

Any input will be appreciated.

Frank Millman

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Ron 2019-03-24 07:25:44 Re: When to store data that could be derived
Previous Message Tom Lane 2019-03-24 05:34:41 Re: regr_slope returning NULL