From: | Merlin Moncure <mmoncure(at)gmail(dot)com> |
---|---|
To: | Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> |
Cc: | Frank Millman <frank(at)chagford(dot)com>, "pgsql-general(at)postgresql(dot)org >> PG-General Mailing List" <pgsql-general(at)postgresql(dot)org> |
Subject: | Re: a JOIN to a VIEW seems slow |
Date: | 2017-09-18 22:44:56 |
Message-ID: | CAHyXU0yW0P=iv=RKs+UnG7QB878G+UdzBFpgzeALEkLTqqi4yg@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
On Thu, Sep 14, 2017 at 8:17 AM, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> wrote:
> 2017-09-14 15:09 GMT+02:00 Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>:
>>
>>
>>
>> 2017-09-14 14:59 GMT+02:00 Frank Millman <frank(at)chagford(dot)com>:
>>>
>>> Pavel Stehule wrote:
>>>
>>> 2017-09-14 10:14 GMT+02:00 Frank Millman <frank(at)chagford(dot)com>:
>>>>
>>>> Hi all
>>>>
>>>> This is a follow-up to a recent question I posted regarding a slow
>>>> query. I thought that the slowness was caused by the number of JOINs in the
>>>> query, but with your assistance I have found the true reason. I said in the
>>>> previous thread that the question had become academic, but now that I
>>>> understand things better, it is no longer academic as it casts doubt on my
>>>> whole approach.
>>>>
>>>> I have split my AR transaction table into three physical tables –
>>>> ar_tran_inv, ar_tran_crn, ar_tran_rec. I will probably add others at some
>>>> point, such as ar_tran_jnl.
>>>>
>>>> I then create a VIEW to view all transactions combined. The view is
>>>> created like this -
>>>>
>>>> CREATE VIEW ar_trans AS
>>>> SELECT ‘ar_inv’ AS tran_type, row_id AS tran_row_id, tran_number ...
>>>> FROM ar_tran_inv WHERE posted = ‘1’
>>>> UNION ALL
>>>> SELECT ‘ar_crn’ AS tran_type, row_id AS tran_row_id, tran_number ...
>>>> FROM ar_tran_crn WHERE posted = ‘1’
>>>> UNION ALL
>>>> SELECT ‘ar_rec’ AS tran_type, row_id AS tran_row_id, tran_number ...
>>>> FROM ar_tran_rec WHERE posted = ‘1’
>>>>
>>>> I have another table called ‘ar_trans_due’, to keep track of outstanding
>>>> transactions. All of the three transaction types generate entries into this
>>>> table. To identify the source of the transaction, I have created columns in
>>>> ar_trans_due called ‘tran_type’ and ‘tran_row_id’. After inserting a row
>>>> into ‘ar_tran_inv’, I invoke this -
>>>>
>>>> INSERT INTO ar_trans_due (tran_type, tran_row_id, ...) VALUES
>>>> (‘ar_inv’, ar_tran_inv.row_id, ...), and similar for the other transaction
>>>> types. It is handled by a Python program, and it all happens within a
>>>> transaction.
>>>>
>>>> When I view a row in ar_trans_due, I want to retrieve data from the
>>>> source transaction, so I have this -
>>>>
>>>> SELECT * FROM ar_trans_due a
>>>> LEFT JOIN ar_trans b ON b.tran_type = a.tran_type AND b.tran_row_id =
>>>> a.tran_row_id
>>>>
>>>> I understand that PostgreSQL must somehow follow a path from the view
>>>> ‘ar_trans’ to the physical table ‘ar_tran_inv’, but I assumed it would
>>>> execute the equivalent of SELECT * FROM ar_tran_inv WHERE row_id =
>>>> a.tran_row_id AND posted = ‘1’.
>>>>
>>>> If this was the case, it would be an indexed read, and very fast.
>>>> Instead, according to EXPLAIN, it performs a sequential scan of the
>>>> ‘ar_tran_inv’ table.
>>>>
>>>> It also scans ‘ar_tran_crn’ and ‘ar_tran_rec’, but EXPLAIN shows that it
>>>> uses a Bitmap Heap Scan on those. I assume that is because the tables are
>>>> currently empty.
>>>>
>>>> Is this analysis correct?
>>>
>>>
>>> please, send EXPLAIN ANALYZE result :)
>>>>
>>>>
>>>
>>>
>>> I tried to reduce this to its simplest form.
>>>
>>> Here is a SQL statement -
>>>
>>> SELECT *
>>> FROM ccc.ar_trans_due a
>>> LEFT JOIN ccc.ar_trans b ON
>>> b.tran_type = a.tran_type AND b.tran_row_id = a.tran_row_id
>>> WHERE a.row_id = 1
>>>
>>> ar_trans_due is a physical table, ar_trans is a view.
>>>
>>> It takes about 28ms. Here is the explain -
>>> https://explain.depesz.com/s/8YY
>>>
>>>
>
>
> The PostgreSQL cannot to push join - in slow case, the UNIONS should be done
> first - and it requires full scan ar_tran_inv - used filter (posted AND
> (deleted_id = 0) is not too effective - maybe some composite or partial
> index helps.
In my testing JOINS can push through UNION ALL. Why do we need to
materialize union first? What version is this?
merlin
From | Date | Subject | |
---|---|---|---|
Next Message | Frank Millman | 2017-09-19 05:06:42 | Re: a JOIN to a VIEW seems slow |
Previous Message | Tom Lane | 2017-09-18 21:58:32 | Re: reload postgresql with invalid pg_hba.conf |