Re: query optimization question

From: Jack Coates <jack(at)lyris(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-performance <pgsql-performance(at)postgresql(dot)org>
Subject: Re: query optimization question
Date: 2004-01-29 21:23:28
Message-ID: 1075411407.7494.130.camel@cletus.lyris.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Thu, 2004-01-29 at 11:31, Tom Lane wrote:
> Jack Coates <jack(at)lyris(dot)com> writes:
> > jackdb=# explain SELECT DISTINCT members_.memberid_
> > jackdb-# FROM members_
> > jackdb-# WHERE ( members_.List_='list1'
> > jackdb(# AND members_.MemberType_='normal'
> > jackdb(# AND members_.SubType_='mail'
> > jackdb(# AND members_.emailaddr_ IS NOT NULL )
> > jackdb-# GROUP BY memberid_ HAVING (
>
> Um, that's not what I had in mind at all. Does GROUP BY actually do
> anything at all here? (You didn't answer me as to whether memberid_
> is a unique identifier or not, but if it is, this GROUP BY is just an
> expensive no-op.)
>

Sorry for the misunderstanding. It should be unique, yes.

> What I was envisioning was pulling the sub-selects up to the top level
> and using grouping to calculate the count(*) values for all memberids
> in parallel. Roughly speaking it would look like (again assuming
> memberid_ is unique)
>
> SELECT memberid_ FROM
> (
> SELECT memberid_ FROM lyrActiveRecips, members_, outmail
> WHERE (all the conditions for this case)
> UNION ALL
> SELECT memberid_ FROM lyrCompletedRecips, members_, outmail
> WHERE (all the conditions for this case)
> )
> GROUP BY memberid_ HAVING count(*) = 3;
>
> However, if you can't change the boilerplate part of your query then
> this is all blue-sky speculation anyway.

Got it now -- I'm running into some subquery errors trying to implement
this, anyway.

> What I'm actually more
> interested in is your statement that MSSQL can do the original query
> quickly. I find that a bit hard to believe because I don't see any
> relevant optimization techniques. Do they have any equivalent to
> EXPLAIN that would give some hint how they're doing it?

yup -- here it is. It will probably be a nasty mess after linewrap gets
done with it, so let me know if you'd like me to post a copy on ftp.

SELECT DISTINCT members_.memberid_ FROM members_ WHERE (
members_.List_='list1' AND members_.MemberType_='normal' AND
members_.SubType_='mail' ) GROUP BY memberid_ HAVING ( ( select
count(*) from lyrActiveRecips, outmail_ where
outmail 11 1 0 NULL NULL 1 NULL 102274.5 NULL NULL NULL 104.10356 NULL NULL SELECT 0 NULL
|--Parallelism(Gather Streams) 11 2 1 Parallelism Gather
Streams NULL NULL 102274.5 0.0 0.22011127 23 104.10356 [members_].[MemberID_] NULL PLAN_ROW -1 1.0
|--Filter(WHERE:(If ([Expr1006] IS NULL) then 0 else
[Expr1006]+If ([Expr1012] IS NULL) then 0 else
[Expr1012]=3)) 11 3 2 Filter Filter WHERE:(If ([Expr1006] IS NULL) then
0 else [Expr1006]+If ([Expr1012] IS NULL) then 0 else
[Expr1012]=3) NULL 102274.5 0.0 3.5393338 23 103.88345 [members_].[MemberID_] NULL PLAN_ROW -1 1.0
|--Hash Match(Right Outer Join,
HASH:([lyrCompletedRecips].[MemberID])=([members_].[MemberID_]),
RESIDUAL:([members_].[MemberID_]=[lyrCompletedRecips].[MemberID])) 11 4 3 Hash Match Right Outer Join HASH:([lyrCompletedRecips].[MemberID])=([members_].[MemberID_]), RESIDUAL:([members_].[MemberID_]=[lyrCompletedRecips].[MemberID]) NULL 4782883.5 0.0 21.874712 23 100.34412 [members_].[MemberID_], [Expr1006], [Expr1012] NULL PLAN_ROW -1 1.0
|--Compute
Scalar(DEFINE:([Expr1012]=Convert([Expr1020]))) 11 5 4 Compute
Scalar Compute
Scalar DEFINE:([Expr1012]=Convert([Expr1020])) [Expr1012]=Convert([Expr1020]) 119575.35 0.0 1.3723248 15 4.3749919 [lyrCompletedRecips].[MemberID], [Expr1012] NULL PLAN_ROW -1 1.0
| |--Hash Match(Aggregate,
HASH:([lyrCompletedRecips].[MemberID]),
RESIDUAL:([lyrCompletedRecips].[MemberID]=[lyrCompletedRecips].[MemberID]) DEFINE:([Expr1020]=COUNT(*))) 11 6 5 Hash Match Aggregate HASH:([lyrCompletedRecips].[MemberID]), RESIDUAL:([lyrCompletedRecips].[MemberID]=[lyrCompletedRecips].[MemberID]) [Expr1020]=COUNT(*) 119575.35 0.0 1.3723248 15 4.3749919 [lyrCompletedRecips].[MemberID], [Expr1020] NULL PLAN_ROW -1 1.0
| |--Parallelism(Repartition Streams, PARTITION
COLUMNS:([lyrCompletedRecips].[MemberID])) 11 7 6 Parallelism Repartition Streams PARTITION COLUMNS:([lyrCompletedRecips].[MemberID]) NULL 119640.6 0.0 0.32407209 173 3.002667 [lyrCompletedRecips].[MemberID] NULL PLAN_ROW -1 1.0
| |--Nested Loops(Inner Join, OUTER
REFERENCES:([outmail_].[MessageID_])) 11 8 7 Nested Loops Inner
Join OUTER
REFERENCES:([outmail_].[MessageID_]) NULL 119640.6 0.0 0.75014657 173 2.6785948 [lyrCompletedRecips].[MemberID] NULL PLAN_ROW -1 1.0
| |--Parallelism(Distribute
Streams) 11 9 8 Parallelism Distribute
Streams NULL NULL 1.0 0.0 2.8501874E-2 128 9.4664574E-2 [outmail_].[MessageID_] NULL PLAN_ROW -1 1.0
| | |--Clustered Index
Scan(OBJECT:([lmdb].[dbo].[outmail_].[IX_outmail_list]),
WHERE:([outmail_].[Type_]='list')) 11 10 9 Clustered Index
Scan Clustered Index
Scan OBJECT:([lmdb].[dbo].[outmail_].[IX_outmail_list]),
WHERE:([outmail_].[Type_]='list') [outmail_].[Type_],
[outmail_].[MessageID_] 1.0 0.01878925 3.9800001E-5 128 3.7658099E-2 [outmail_].[Type_], [outmail_].[MessageID_] NULL PLAN_ROW 0 1.0
| |--Clustered Index
Seek(OBJECT:([lmdb].[dbo].[lyrCompletedRecips].[IX_CompletedRecipsMailingID]), SEEK:([lyrCompletedRecips].[MailingID]=[outmail_].[MessageID_]), WHERE:([lyrCompletedRecips].[CompletionStatusID]=300 AN 11 11 8 Clustered Index Seek Clustered Index Seek OBJECT:([lmdb].[dbo].[lyrCompletedRecips].[IX_CompletedRecipsMailingID]), SEEK:([lyrCompletedRecips].[MailingID]=[outmail_].[MessageID_]), WHERE:([lyrCompletedRecips].[CompletionStatusID]=300 AND [lyrCompletedRecips].[FinalAttempt]>'Jan 20 2004 12:00AM') [lyrCompletedRecips].[CompletionStatusID], [lyrCompletedRecips].[FinalAttempt], [lyrCompletedRecips].[MemberID] 119640.6 0.5750553 0.13207871 53 1.5463468 [lyrCompletedRecips].[CompletionStatusID], [lyrCompletedRecips].[FinalAttempt], [lyrCompletedRecips].[MemberID] NULL PLAN_ROW -1 3.0
|--Parallelism(Repartition Streams, PARTITION
COLUMNS:([members_].[MemberID_])) 11 19 4 Parallelism Repartition
Streams PARTITION
COLUMNS:([members_].[MemberID_]) NULL 4782883.5 0.0 15.474822 19 74.094414 [members_].[MemberID_], [Expr1006] NULL PLAN_ROW -1 1.0
|--Nested Loops(Left Outer Join,
WHERE:([members_].[MemberID_]=[lyrActiveRecips].[MemberID])) 11 20 19 Nested Loops Left Outer Join WHERE:([members_].[MemberID_]=[lyrActiveRecips].[MemberID]) NULL 4782883.5 0.0 9.9962263 19 58.619591 [members_].[MemberID_], [Expr1006] NULL PLAN_ROW -1 1.0
|--Clustered Index
Seek(OBJECT:([lmdb].[dbo].[members_].[IX_members_List_EmailLC]),
SEEK:([members_].[List_]='list1'),
WHERE:([members_].[MemberType_]='normal' AND
[members_].[SubType_]='mail') ORDERED FORWARD) 11 22 20 Clustered Index
Seek Clustered Index
Seek OBJECT:([lmdb].[dbo].[members_].[IX_members_List_EmailLC]),
SEEK:([members_].[List_]='list1'),
WHERE:([members_].[MemberType_]='normal' AND
[members_].[SubType_]='mail') ORDERED FORWARD [members_].[SubType_],
[members_].[MemberType_],
[members_].[MemberID_] 4782883.5 40.160122 3.2745986 410 43.434719 [members_].[SubType_], [members_].[MemberType_], [members_].[MemberID_] NULL PLAN_ROW -1 1.0
|--Table Spool 11 24 20 Table Spool Lazy
Spool NULL NULL 1.0 1.6756756E-2 3.7999999E-7 15 0.90211391 [lyrActiveRecips].[MemberID], [Expr1006] NULL PLAN_ROW -1 4782883.5
|--Compute
Scalar(DEFINE:([Expr1006]=Convert([Expr1021]))) 11 25 24 Compute
Scalar Compute
Scalar DEFINE:([Expr1006]=Convert([Expr1021])) [Expr1006]=Convert([Expr1021]) 1.0 0.0 7.6000001E-6 15 2.4437904E-2 [lyrActiveRecips].[MemberID], [Expr1006] NULL PLAN_ROW -1 1.0
|--Stream Aggregate(GROUP
BY:([lyrActiveRecips].[MemberID])
DEFINE:([Expr1021]=Count(*))) 11 26 25 Stream Aggregate Aggregate GROUP
BY:([lyrActiveRecips].[MemberID]) [Expr1021]=Count(*) 1.0 0.0 7.6000001E-6 15 2.4437904E-2 [lyrActiveRecips].[MemberID], [Expr1021] NULL PLAN_ROW -1 1.0
|--Sort(ORDER
BY:([lyrActiveRecips].[MemberID] ASC)) 11 27 26 Sort Sort ORDER
BY:([lyrActiveRecips].[MemberID]
ASC) NULL 1.0 1.1261261E-2 1.00011E-4 11 2.4430305E-2 [lyrActiveRecips].[MemberID] NULL PLAN_ROW -1 1.0

|--Filter(WHERE:([outmail_].[Type_]='list')) 11 28 27 Filter Filter WHERE:([outmail_].[Type_]='list') NULL 1.0 0.0 4.7999998E-7 156 1.3069032E-2 [lyrActiveRecips].[MemberID] NULL PLAN_ROW -1 1.0
|--Bookmark
Lookup(BOOKMARK:([Bmk1004]),
OBJECT:([lmdb].[dbo].[outmail_])) 11 29 28 Bookmark Lookup Bookmark
Lookup BOOKMARK:([Bmk1004]),
OBJECT:([lmdb].[dbo].[outmail_]) [outmail_].[Type_] 1.0 3.1249749E-3 0.0000011 156 1.3068552E-2 [lyrActiveRecips].[MemberID], [outmail_].[Type_] NULL PLAN_ROW -1 1.0
|--Nested
Loops(Inner Join, OUTER
REFERENCES:([lyrActiveRecips].[MailingID])) 11 30 29 Nested Loops Inner
Join OUTER
REFERENCES:([lyrActiveRecips].[MailingID]) NULL 1.0 0.0 0.00001254 138 9.9424766E-3 [lyrActiveRecips].[MemberID], [Bmk1004] NULL PLAN_ROW -1 1.0

|--Bookmark Lookup(BOOKMARK:([Bmk1002]),
OBJECT:([lmdb].[dbo].[lyrActiveRecips])) 11 31 30 Bookmark
Lookup Bookmark Lookup BOOKMARK:([Bmk1002]),
OBJECT:([lmdb].[dbo].[lyrActiveRecips]) [lyrActiveRecips].[MemberID],
[lyrActiveRecips].[MailingID] 1.0 3.1249749E-3 0.0000011 53 6.4091529E-3 [lyrActiveRecips].[MemberID], [lyrActiveRecips].[MailingID] NULL PLAN_ROW -1 1.0
|
|--Index
Seek(OBJECT:([lmdb].[dbo].[lyrActiveRecips].[jacktest_lar_date_ix]),
SEEK:([lyrActiveRecips].[NextAttempt] > 'Jan 20 2004 12:00AM') ORDERED
FORWARD) 11 32 31 Index Seek Index
Seek OBJECT:([lmdb].[dbo].[lyrActiveRecips].[jacktest_lar_date_ix]),
SEEK:([lyrActiveRecips].[NextAttempt] > 'Jan 20 2004 12:00AM') ORDERED
FORWARD [Bmk1002] 1.0 3.2034749E-3 7.9603E-5 40 3.2830781E-3 [Bmk1002] NULL PLAN_ROW -1 1.0
|--Index
Seek(OBJECT:([lmdb].[dbo].[outmail_].[PK_outmail_]),
SEEK:([outmail_].[MessageID_]=[lyrActiveRecips].[MailingID]) ORDERED
FORWARD) 11 33 30 Index Seek Index
Seek OBJECT:([lmdb].[dbo].[outmail_].[PK_outmail_]),
SEEK:([outmail_].[MessageID_]=[lyrActiveRecips].[MailingID]) ORDERED
FORWARD [Bmk1004] 1.0 3.2034749E-3 7.9603E-5 93 3.520784E-3 [Bmk1004] NULL PLAN_ROW -1 3.0
--
Jack Coates, Lyris Technologies Applications Engineer
510-549-4350 x148, jack(at)lyris(dot)com
"Interoperability is the keyword, uniformity is a dead end."
--Olivier Fourdan

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Richard Huxton 2004-01-29 21:37:16 Re: Explain plan for 2 column index
Previous Message Bill Moran 2004-01-29 19:44:28 Re: [PERFORM] Set-Returning Functions WAS: On the performance of