Re: a JOIN to a VIEW seems slow

From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Frank Millman <frank(at)chagford(dot)com>
Cc: "pgsql-general(at)postgresql(dot)org >> PG-General Mailing List" <pgsql-general(at)postgresql(dot)org>
Subject: Re: a JOIN to a VIEW seems slow
Date: 2017-09-14 13:17:55
Message-ID: CAFj8pRAsjMx54YhBkiYi1nTZpkz7OmrkNCxGj1OU+cj0a8MVow@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

2017-09-14 15:09 GMT+02:00 Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>:

>
>
> 2017-09-14 14:59 GMT+02:00 Frank Millman <frank(at)chagford(dot)com>:
>
>> Pavel Stehule wrote:
>>
>> 2017-09-14 10:14 GMT+02:00 Frank Millman <frank(at)chagford(dot)com>:
>>
>>> Hi all
>>>
>>> This is a follow-up to a recent question I posted regarding a slow
>>> query. I thought that the slowness was caused by the number of JOINs in the
>>> query, but with your assistance I have found the true reason. I said in the
>>> previous thread that the question had become academic, but now that I
>>> understand things better, it is no longer academic as it casts doubt on my
>>> whole approach.
>>>
>>> I have split my AR transaction table into three physical tables –
>>> ar_tran_inv, ar_tran_crn, ar_tran_rec. I will probably add others at some
>>> point, such as ar_tran_jnl.
>>>
>>> I then create a VIEW to view all transactions combined. The view is
>>> created like this -
>>>
>>> CREATE VIEW ar_trans AS
>>> SELECT ‘ar_inv’ AS tran_type, row_id AS tran_row_id, tran_number ...
>>> FROM ar_tran_inv WHERE posted = ‘1’
>>> UNION ALL
>>> SELECT ‘ar_crn’ AS tran_type, row_id AS tran_row_id, tran_number ...
>>> FROM ar_tran_crn WHERE posted = ‘1’
>>> UNION ALL
>>> SELECT ‘ar_rec’ AS tran_type, row_id AS tran_row_id, tran_number ...
>>> FROM ar_tran_rec WHERE posted = ‘1’
>>>
>>> I have another table called ‘ar_trans_due’, to keep track of outstanding
>>> transactions. All of the three transaction types generate entries into this
>>> table. To identify the source of the transaction, I have created columns in
>>> ar_trans_due called ‘tran_type’ and ‘tran_row_id’. After inserting a
>>> row into ‘ar_tran_inv’, I invoke this -
>>>
>>> INSERT INTO ar_trans_due (tran_type, tran_row_id, ...) VALUES
>>> (‘ar_inv’, ar_tran_inv.row_id, ...), and similar for the other transaction
>>> types. It is handled by a Python program, and it all happens within a
>>> transaction.
>>>
>>> When I view a row in ar_trans_due, I want to retrieve data from the
>>> source transaction, so I have this -
>>>
>>> SELECT * FROM ar_trans_due a
>>> LEFT JOIN ar_trans b ON b.tran_type = a.tran_type AND b.tran_row_id =
>>> a.tran_row_id
>>>
>>> I understand that PostgreSQL must somehow follow a path from the view
>>> ‘ar_trans’ to the physical table ‘ar_tran_inv’, but I assumed it would
>>> execute the equivalent of SELECT * FROM ar_tran_inv WHERE row_id =
>>> a.tran_row_id AND posted = ‘1’.
>>>
>>> If this was the case, it would be an indexed read, and very fast.
>>> Instead, according to EXPLAIN, it performs a sequential scan of the
>>> ‘ar_tran_inv’ table.
>>>
>>> It also scans ‘ar_tran_crn’ and ‘ar_tran_rec’, but EXPLAIN shows that it
>>> uses a Bitmap Heap Scan on those. I assume that is because the tables are
>>> currently empty.
>>>
>>> Is this analysis correct?
>>>
>>
>> please, send EXPLAIN ANALYZE result :)
>>
>>>
>>>
>>
>> I tried to reduce this to its simplest form.
>>
>> Here is a SQL statement -
>>
>> SELECT *
>> FROM ccc.ar_trans_due a
>> LEFT JOIN ccc.ar_trans b ON
>> b.tran_type = a.tran_type AND b.tran_row_id = a.tran_row_id
>> WHERE a.row_id = 1
>>
>> ar_trans_due is a physical table, ar_trans is a view.
>>
>> It takes about 28ms. Here is the explain - https://explain.depesz.com/s/8
>> YY
>>
>
>>
>
The PostgreSQL cannot to push join - in slow case, the UNIONS should be
done first - and it requires full scan ar_tran_inv - used filter (posted
AND (deleted_id = 0) is not too effective - maybe some composite or partial
index helps.

The fast query doesn't contains unions - so there are bigger space for
optimizer - ar_tran_inv is filtered effective - by primary key.

So main problem is impossible to push information a.row_id = 1 to deep to
query.

> Then I changed it to join each of the physical tables, instead of the view
>> -
>>
>> SELECT *
>> FROM ccc.ar_trans_due a
>> LEFT JOIN ccc.ar_tran_inv b ON
>> b.row_id = a.tran_row_id
>> LEFT JOIN ccc.ar_tran_crn c ON
>> c.row_id = a.tran_row_id
>> LEFT JOIN ccc.ar_tran_rec d ON
>> d.row_id = a.tran_row_id
>> WHERE a.row_id = 1
>>
>> This takes just over 1ms. Here is the explain -
>> https://explain.depesz.com/s/U29h
>>
>> I tried setting enable_seq_scan to off – it ran even slower!
>>
>> Frank
>>
>>
>
>

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message vinny 2017-09-14 13:32:12 Re: looking for a globally unique row ID
Previous Message Pavel Stehule 2017-09-14 13:09:23 Re: a JOIN to a VIEW seems slow