Quick Links

Re: When to store data that could be derived

From:	Frank <frank(at)chagford(dot)com>
To:	pgsql-general(at)lists(dot)postgresql(dot)org
Subject:	Re: When to store data that could be derived
Date:	2019-03-26 14:37:38
Message-ID:	24fa9e9a-20e8-9bf1-2400-9e9d12d6bf69@chagford.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general

On 2019-03-25 5:44 PM, Frank wrote:
>
>
> On reflection, I have not been consistent with my use of indexes, and I
> think that will affect the query plan.
>
> There are at least two issues -
>
> 1. The first table defined in the VIEW is ccc.ar_tran_inv. It has the
> following index -
> "arinv_cust_date" btree (cust_row_id NULLS FIRST, tran_date NULLS
> FIRST) WHERE deleted_id = 0
>
> I have not used "WHERE deleted_id = 0" when constructing the VIEW, but I
> have used "WHERE posted = '1'". I don't think the index can be used with
> this setup.
>
> 2. The complicated table in the VIEW is ccc.ar_rec_subtran. Various
> columns such as tran_date and posted are retrieved via CASE expressions
> from two underlying tables. Those tables have certain indexes defined,
> but I cannot see how they can be utilised from my current setup.
>
> I think I should spend some time tidying this up before you try to make
> sense of the query plan. Any tips on how to improve it will be appreciated.
>

I have spent a lot of time testing various permutations and trying to
understand them using EXPLAIN.

My original concern was the overhead of calculating derived data. I now
realise that it is more important to get the indexes right, as that has
a much bigger impact on performance.

The VIEW that I have been using for testing is actually masking the
problem. The view combines four tables, three of which are
straightforward, easy to index, and fast. The fourth one is complex,
difficult to index, and comparatively slow. So I forgot about the VIEW
and concentrated on the complex table.

I now understand the caveats I received earlier in this thread. It seems
impossible to make use of the indexes on the JOINed tables in the
following query. I did create an index on cust_row_id in the main table,
and it made a difference with a simple query, but by the time I added
the JOINs, the improvement was hardly noticeable.

This is the query that I used -

EXPLAIN SELECT COUNT(*)
FROM ccc.ar_rec_subtran a
LEFT JOIN ccc.ar_tran_rec_det z ON z.row_id = a.tran_det_row_id
LEFT JOIN ccc.ar_tran_rec y ON y.row_id = z.tran_row_id
LEFT JOIN ccc.cb_tran_rec_det x ON x.row_id = a.tran_det_row_id
LEFT JOIN ccc.cb_tran_rec w ON w.row_id = x.tran_row_id
WHERE a.cust_row_id = 4 AND
CASE
WHEN a.tran_type = 'ar_rec' THEN y.tran_date
WHEN a.tran_type = 'cb_rec' THEN w.tran_date
END BETWEEN '2015-05-01' AND '2015-05-31' AND
CASE
WHEN a.tran_type = 'ar_rec' THEN y.posted
WHEN a.tran_type = 'cb_rec' THEN w.posted
END = true AND
a.deleted_id = 0;

Attached is the EXPLAIN for this one.

I also ran the same query with the following WHERE clause -

WHERE a.cust_row_id = 4 AND
((a.tran_type = 'ar_rec' AND
y.tran_date BETWEEN '2015-05-01' AND '2015-05-31') OR
(a.tran_type = 'cb_rec' AND
w.tran_date BETWEEN '2015-05-01' AND '2015-05-31')) AND
((a.tran_type = 'ar_rec' AND y.posted = true) OR
(a.tran_type = 'cb_rec' AND w.posted = true)) AND
a.deleted_id = 0;

The timings were virtually identical, so I have not attached that EXPLAIN.

Frank

Attachment	Content-Type	Size
explain_2	text/plain	3.1 KB

In response to

Re: When to store data that could be derived at 2019-03-25 15:44:21 from Frank

Browse pgsql-general by date

	From	Date	Subject
Next Message	Павлухин Иван	2019-03-26 15:16:21	Re: Column lookup in a row performance
Previous Message	Adrian Klaver	2019-03-26 14:24:00	Re: Key encryption and relational integrity