From: | "Frank Millman" <frank(at)chagford(dot)com> |
---|---|
To: | <pgsql-general(at)postgresql(dot)org> |
Subject: | Re: a JOIN to a VIEW seems slow |
Date: | 2017-09-14 12:59:56 |
Message-ID: | 7819910CA9CE46C8BC311DEB8147C8AA@FrankLaptop |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Pavel Stehule wrote:
2017-09-14 10:14 GMT+02:00 Frank Millman <frank(at)chagford(dot)com>:
Hi all
This is a follow-up to a recent question I posted regarding a slow query. I thought that the slowness was caused by the number of JOINs in the query, but with your assistance I have found the true reason. I said in the previous thread that the question had become academic, but now that I understand things better, it is no longer academic as it casts doubt on my whole approach.
I have split my AR transaction table into three physical tables – ar_tran_inv, ar_tran_crn, ar_tran_rec. I will probably add others at some point, such as ar_tran_jnl.
I then create a VIEW to view all transactions combined. The view is created like this -
CREATE VIEW ar_trans AS
SELECT ‘ar_inv’ AS tran_type, row_id AS tran_row_id, tran_number ... FROM ar_tran_inv WHERE posted = ‘1’
UNION ALL
SELECT ‘ar_crn’ AS tran_type, row_id AS tran_row_id, tran_number ... FROM ar_tran_crn WHERE posted = ‘1’
UNION ALL
SELECT ‘ar_rec’ AS tran_type, row_id AS tran_row_id, tran_number ... FROM ar_tran_rec WHERE posted = ‘1’
I have another table called ‘ar_trans_due’, to keep track of outstanding transactions. All of the three transaction types generate entries into this table. To identify the source of the transaction, I have created columns in ar_trans_due called ‘tran_type’ and ‘tran_row_id’. After inserting a row into ‘ar_tran_inv’, I invoke this -
INSERT INTO ar_trans_due (tran_type, tran_row_id, ...) VALUES (‘ar_inv’, ar_tran_inv.row_id, ...), and similar for the other transaction types. It is handled by a Python program, and it all happens within a transaction.
When I view a row in ar_trans_due, I want to retrieve data from the source transaction, so I have this -
SELECT * FROM ar_trans_due a
LEFT JOIN ar_trans b ON b.tran_type = a.tran_type AND b.tran_row_id = a.tran_row_id
I understand that PostgreSQL must somehow follow a path from the view ‘ar_trans’ to the physical table ‘ar_tran_inv’, but I assumed it would execute the equivalent of SELECT * FROM ar_tran_inv WHERE row_id = a.tran_row_id AND posted = ‘1’.
If this was the case, it would be an indexed read, and very fast. Instead, according to EXPLAIN, it performs a sequential scan of the ‘ar_tran_inv’ table.
It also scans ‘ar_tran_crn’ and ‘ar_tran_rec’, but EXPLAIN shows that it uses a Bitmap Heap Scan on those. I assume that is because the tables are currently empty.
Is this analysis correct?
please, send EXPLAIN ANALYZE result :)
I tried to reduce this to its simplest form.
Here is a SQL statement -
SELECT *
FROM ccc.ar_trans_due a
LEFT JOIN ccc.ar_trans b ON
b.tran_type = a.tran_type AND b.tran_row_id = a.tran_row_id
WHERE a.row_id = 1
ar_trans_due is a physical table, ar_trans is a view.
It takes about 28ms. Here is the explain - https://explain.depesz.com/s/8YY
Then I changed it to join each of the physical tables, instead of the view -
SELECT *
FROM ccc.ar_trans_due a
LEFT JOIN ccc.ar_tran_inv b ON
b.row_id = a.tran_row_id
LEFT JOIN ccc.ar_tran_crn c ON
c.row_id = a.tran_row_id
LEFT JOIN ccc.ar_tran_rec d ON
d.row_id = a.tran_row_id
WHERE a.row_id = 1
This takes just over 1ms. Here is the explain - https://explain.depesz.com/s/U29h
I tried setting enable_seq_scan to off – it ran even slower!
Frank
From | Date | Subject | |
---|---|---|---|
Next Message | Rafal Pietrak | 2017-09-14 13:06:47 | Re: looking for a globally unique row ID |
Previous Message | rob stone | 2017-09-14 12:38:21 | Re: How to add new Collation language |