Re: full join in view

From: "Tambet Matiisen" <t(dot)matiisen(at)aprote(dot)ee>
To: <pgsql-sql(at)postgresql(dot)org>
Subject: Re: full join in view
Date: 2003-01-14 14:27:22
Message-ID: 81132473206F3A46A72BD6116E1A06AE1B14C2@black.aprote.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql


First I would like to say, that I'm quite amazed. You even guessed table names right! :) I did not expect such an in-depth analysis in such a short time. Thanks, Tomasz!

> -----Original Message-----
> From: Tomasz Myrta [mailto:jasiek(at)klaster(dot)net]
> Sent: Tuesday, January 14, 2003 11:51 AM
> To: Tambet Matiisen
> Cc: pgsql-sql(at)postgresql(dot)org
> Subject: Re: [SQL] full join in view
>
>
> Tambet Matiisen wrote:
>
> > You are right. After disabling seq_scan, it uses indexes
> just as you
> > described. Unfortunately my view happens to use subquery:
>
> Don't disable seq_scan - sometimes it is better than indexscan.

I just did it for testing and for one session. I never disable it on production server.

...
>
> There is one more problem in your query - coalesce, which possibly
> disables any indexing in your view. Try to rewrite your view
> - subquery
> shouldn't return dor_kst_id and dor_mat_id null.
>
Coalesce did not pose any problems. Unless I tried to filter using one of the coalesce-fields, which does not use indeces of course.

> Is dor_kst_id the same as kdt_kst_id and as mat_id? After
> some database
> practicing I found, that using the same name in all tables is
> much more
> comfortably
>
This way I can refer most columns without prefixing them with table alias. But it's anyway good habit to use table aliases, so this is not that important. I think in next project I try it in your way.

> For each material (materjalid) and koostud (koostud) you want to find
> some current value (koostude_detaild) and compare it to some sum
> (documentid...)?
> I'm not sure if I understand well your view, but here is my
> version of
> this view - without subquery:
>

I tried to save few scans by not including "koostud" and "materjalid" in my original query. Based on yours, I created a new version:

CREATE OR REPLACE VIEW v_tegelikud_kulud2 AS
SELECT
kst.kst_id,
mat.mat_id,
max(kdt.detaili_nr) AS detaili_nr,
max(kdt.arv) AS arv,
max(kdt.kulu) AS kulu,
max(kdt.yhik) AS yhik,
max(kdt.koefitsent) AS koefitsent,
max(kdt.eeldatav_hind) AS eeldatav_hind,
sum(dor.kogus * dor.koefitsent::numeric) AS kogus,
sum(dor.kokku) AS kokku
FROM koostud kst
CROSS JOIN materjalid mat
LEFT JOIN koostude_detailid kdt
ON kst.kst_id = kdt.kdt_kst_id AND mat.mat_id = kdt.kdt_mat_id
LEFT JOIN dokumentide_read dor
ON kst.kst_id = dor.dor_kst_id AND mat.mat_id = dor.dor_mat_id AND EXISTS
(
SELECT 1
FROM dokumendid dok
WHERE dor.dor_dok_id = dok.dok_id AND dok.tyyp = 30 AND dok.kinnitaja IS NOT NULL
)
WHERE kdt.kdt_id IS NOT NULL AND dor.dor_id IS NOT NULL
GROUP BY kst.kst_id, mat.mat_id;

I do a cross join between "koostud" and "materjalid", because "FROM koostud kst, materjalid mat" gave me syntax errors. Also I had to move "dokumendid" table to EXISTS subquery, to get equivalent results with the original query. Just LEFT JOIN-ing it is not enough and subquery troubled optimizer. There is also a WHERE condition to show only those materials, that appear in one of the tables "koostude_detailid" or "dokumentide_read". Here is the execution plan:

explain select count(1) from v_tegelikud_kulud2 where kst_id = 1125;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=92.60..92.60 rows=1 width=107)
-> Subquery Scan v_tegelikud_kulud2 (cost=69.58..92.58 rows=8 width=107)
-> Aggregate (cost=69.58..92.58 rows=8 width=107)
-> Group (cost=69.58..91.00 rows=79 width=107)
-> Merge Join (cost=69.58..90.61 rows=79 width=107)
Merge Cond: (("outer".kst_id = "inner".dor_kst_id) AND ("outer".mat_id = "inner".dor_mat_id))
Join Filter: (subplan)
Filter: ("inner".dor_id IS NOT NULL)
-> Merge Join (cost=17.54..18.52 rows=79 width=66)
Merge Cond: (("outer".kst_id = "inner".kdt_kst_id) AND ("outer".mat_id = "inner".kdt_mat_id))
Filter: ("inner".kdt_id IS NOT NULL)
-> Sort (cost=10.56..10.76 rows=79 width=8)
Sort Key: kst.kst_id, mat.mat_id
-> Nested Loop (cost=0.00..8.07 rows=79 width=8)
-> Index Scan using kst_pk on koostud kst (cost=0.00..4.49 rows=1 width=4)
Index Cond: (kst_id = 1125)
-> Seq Scan on materjalid mat (cost=0.00..2.79 rows=79 width=4)
-> Sort (cost=6.98..7.27 rows=113 width=58)
Sort Key: kdt.kdt_kst_id, kdt.kdt_mat_id
-> Seq Scan on koostude_detailid kdt (cost=0.00..3.13 rows=113 width=58)
-> Sort (cost=52.03..53.93 rows=761 width=41)
Sort Key: dor.dor_kst_id, dor.dor_mat_id
-> Seq Scan on dokumentide_read dor (cost=0.00..15.61 rows=761 width=41)
SubPlan
-> Index Scan using dok_pk on dokumendid dok (cost=0.00..3.47 rows=1 width=0)
Index Cond: ($0 = dok_id)
Filter: ((tyyp = 30) AND (kinnitaja IS NOT NULL))
(27 rows)

But there are still few things that worry me:
1. Cross join between koostud and materjalid. Table "materjalid" may have up to 10000 rows and only 20-30 of them are actually needed.
2. Indeces on "koostude_detailid" and "dokumentide_read" are not used. Probably my tables do not contain enough rows. Maybe I should generate more test data first.
3. The cost of this query is twice as big, as my original query. It seems to me, that SubPlan is causing this. I tried to move it to subquery, but then the optimizer chose a totally different execution plan and seemingly was not able to use indeces of "dokumentide_read" table. The version with subquery:

CREATE OR REPLACE VIEW v_tegelikud_kulud3 AS
SELECT
kst.kst_id,
mat.mat_id,
max(kdt.detaili_nr) AS detaili_nr,
max(kdt.arv) AS arv,
max(kdt.kulu) AS kulu,
max(kdt.yhik) AS yhik,
max(kdt.koefitsent) AS koefitsent,
max(kdt.eeldatav_hind) AS eeldatav_hind,
sum(dor.kogus * dor.koefitsent::numeric) AS kogus,
sum(dor.kokku) AS kokku
FROM koostud kst
CROSS JOIN materjalid mat
LEFT JOIN koostude_detailid kdt
ON kst.kst_id = kdt.kdt_kst_id AND mat.mat_id = kdt.kdt_mat_id
LEFT JOIN
(
SELECT dor.*
FROM dokumentide_read dor
JOIN dokumendid dok ON dor.dor_dok_id = dok.dok_id
WHERE dok.tyyp = 30 AND dok.kinnitaja IS NOT NULL
) dor
ON kst.kst_id = dor.dor_kst_id AND mat.mat_id = dor.dor_mat_id
WHERE kdt.kdt_id IS NOT NULL AND dor.dor_id IS NOT NULL
GROUP BY kst.kst_id, mat.mat_id;

And execution plan:

hekotek=# explain select count(1) from v_tegelikud_kulud3 where kst_id = 1125;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=56.52..56.52 rows=1 width=111)
-> Subquery Scan v_tegelikud_kulud3 (cost=53.31..56.50 rows=8 width=111)
-> Aggregate (cost=53.31..56.50 rows=8 width=111)
-> Group (cost=53.31..54.92 rows=79 width=111)
-> Merge Join (cost=53.31..54.53 rows=79 width=111)
Merge Cond: (("outer".kst_id = "inner".dor_kst_id) AND ("outer".mat_id = "inner".dor_mat_id))
Filter: ("inner".dor_id IS NOT NULL)
-> Merge Join (cost=17.53..18.50 rows=79 width=66)
Merge Cond: (("outer".kst_id = "inner".kdt_kst_id) AND ("outer".mat_id = "inner".kdt_mat_id))
Filter: ("inner".kdt_id IS NOT NULL)
-> Sort (cost=10.54..10.74 rows=79 width=8)
Sort Key: kst.kst_id, mat.mat_id
-> Nested Loop (cost=0.00..8.05 rows=79 width=8)
-> Index Scan using kst_pk on koostud kst (cost=0.00..4.47 rows=1 width=4)
Index Cond: (kst_id = 1125)
-> Seq Scan on materjalid mat (cost=0.00..2.79 rows=79 width=4)
-> Sort (cost=6.98..7.27 rows=113 width=58)
Sort Key: kdt.kdt_kst_id, kdt.kdt_mat_id
-> Seq Scan on koostude_detailid kdt (cost=0.00..3.13 rows=113 width=58)
-> Sort (cost=35.78..35.99 rows=84 width=45)
Sort Key: dor.dor_kst_id, dor.dor_mat_id
-> Merge Join (cost=9.04..33.09 rows=84 width=45)
Merge Cond: ("outer".dor_dok_id = "inner".dok_id)
-> Index Scan using dor_dok_fk_i on dokumentide_read dor (cost=0.00..20.91 rows=761 width=41)
-> Sort (cost=9.04..9.13 rows=36 width=4)
Sort Key: dok.dok_id
-> Seq Scan on dokumendid dok (cost=0.00..8.10 rows=36 width=4)
Filter: ((tyyp = 30) AND (kinnitaja IS NOT NULL))
(28 rows)

Uh.. Mails are getting very lengthy.

Tambet

Responses

Browse pgsql-sql by date

  From Date Subject
Next Message mdc@keko.com.ar 2003-01-14 14:45:12 database broken ?
Previous Message Pedro Igor 2003-01-14 10:53:45 Data between different databases