From: | Greg Smith <gsmith(at)gregsmith(dot)com> |
---|---|
To: | Віталій Тимчишин <tivv00(at)gmail(dot)com> |
Cc: | Glenn Maynard <glennfmaynard(at)gmail(dot)com>, pgsql-performance(at)postgresql(dot)org |
Subject: | Re: Nested query performance issue |
Date: | 2009-04-09 23:59:41 |
Message-ID: | alpine.GSO.2.01.0904091945020.9649@westnet.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
On Thu, 9 Apr 2009, tiv00 wrote:
> create or replace function explode_array(in_array anyarray) returns setof anyelement as
> $$
> select ($1)[s] from generate_series(1,array_upper($1, 1)) as s;
> $$
> language sql immutable;
Note that you can make this function a bit more general by using
array_lower as the bottom bound:
create or replace function explode_array(in_array anyarray) returns setof anyelement as
$$
select ($1)[s] from generate_series
(array_lower($1, 1), array_upper($1, 1)) as s;
$$
language sql immutable;
While you won't run into them in most situations, it is possible to create
arrays where the lower bound isn't 1 by using the subscript syntax. The
example in the manual even shows that somewhat odd possibilities like
assigning something to "myarray[-2:7]" works.
As already pointed out, once you're in 8.4 the windowing functions might
be a better fit here, but 8.4 does have "unnest" built-in that replaces
the need to code this sort of thing yourself. You might want to name this
function accordingly to match that upcoming standard (or not, depending on
whether you want to avoid or be reminding of the potential for using the
built-in). See
http://www.depesz.com/index.php/2008/11/14/waiting-for-84-array-aggregate-and-array-unpacker/
for some examples.
--
* Greg Smith gsmith(at)gregsmith(dot)com http://www.gregsmith.com Baltimore, MD
>From pgsql-performance-owner(at)postgresql(dot)org Fri Apr 10 02:08:40 2009
Received: from localhost (unknown [200.46.204.183])
by mail.postgresql.org (Postfix) with ESMTP id 33D7463315E
for <pgsql-performance-postgresql(dot)org(at)mail(dot)postgresql(dot)org>; Fri, 10 Apr 2009 02:08:37 -0300 (ADT)
Received: from mail.postgresql.org ([200.46.204.86])
by localhost (mx1.hub.org [200.46.204.183]) (amavisd-maia, port 10024)
with ESMTP id 46378-10
for <pgsql-performance-postgresql(dot)org(at)mail(dot)postgresql(dot)org>;
Fri, 10 Apr 2009 02:08:24 -0300 (ADT)
X-Greylist: from auto-whitelisted by SQLgrey-1.7.6
Received: from mail.vanten.com (doc.vanten.com [203.216.141.136])
by mail.postgresql.org (Postfix) with ESMTP id E71E363242D
for <pgsql-performance(at)postgresql(dot)org>; Fri, 10 Apr 2009 02:08:23 -0300 (ADT)
Received: from hexagon.office.vanten.com (hexagon.office.vanten.com [192.168.220.16])
by mail.vanten.com (Postfix) with ESMTP id 4FB3F4119
for <pgsql-performance(at)postgresql(dot)org>; Fri, 10 Apr 2009 14:08:20 +0900 (JST)
Received: from Espresso (unknown [192.168.220.80])
by hexagon.office.vanten.com (Postfix) with ESMTP id 95A733C883
for <pgsql-performance(at)postgresql(dot)org>; Fri, 10 Apr 2009 14:08:19 +0900 (JST)
From: "Rainer Mager" <rainer(at)vanten(dot)com>
To: <pgsql-performance(at)postgresql(dot)org>
References: <003001c9b645$e8a7cec0$b9f76c40$(at)com> <e7f9235d0904051646j7e8374afncf79e3708ae40c20(at)mail(dot)gmail(dot)com> <003501c9b656$a9c73fe0$fd55bfa0$(at)com> <7371(dot)1238981597(at)sss(dot)pgh(dot)pa(dot)us>
In-Reply-To:
Subject: Re: difficulties with time based queries
Date: Fri, 10 Apr 2009 14:08:56 +0900
Message-ID: <000001c9b99a$743da930$5cb8fb90$(at)com>
MIME-Version: 1.0
Content-Type: multipart/alternative;
boundary="----=_NextPart_000_0001_01C9B9E5.E4255130"
X-Mailer: Microsoft Office Outlook 12.0
Thread-Index: Acm2V67taS6N+dZOSV+76Ry0403szgAD8gPgACs3BPAAlg2qoA==
Content-Language: en-us
X-Virus-Scanned: Maia Mailguard 1.0.1
X-Spam-Status: No, hits=0.001 tagged_above=0 required=5
tests=HTML_MESSAGE=0.001
X-Spam-Level:
X-Archive-Number: 200904/184
X-Sequence-Number: 33551
This is a multipart message in MIME format.
------=_NextPart_000_0001_01C9B9E5.E4255130
Content-Type: text/plain;
charset="us-ascii"
Content-Transfer-Encoding: 7bit
Thanks for all of the suggestions so far. I've been trying to reduce the
number of indices I have, but I'm running into a problem. I have a need to
do queries on this table with criteria applied to the date and possibly any
or all of the other key columns. As a reminder, here's my table:
Table "public.ad_log"
Column | Type |
Modifiers
------------+-----------------------------+---------------------------------
---------------------------
ad_log_id | integer | not null default
nextval('ad_log_ad_log_id_seq'::regclass)
channel | integer | not null
player | integer | not null
ad | integer | not null
start_time | timestamp without time zone |
end_time | timestamp without time zone |
So, I need indices that make it fast querying against start_time as well as
all possible combinations of channel, player, and ad. Below is a sample
query that uses all of these (note that I've removed actual strings to
protect customer data). The result is fine in terms of speed, but since it's
using the ad_log_ad_date index I'm wondering what the best strategy is to
cover queries that don't specify an ad. Should I have 4 indices, one with
just the start_time (for when no other columns are specified) and the other
three each with the start_time and the three other criteria: channel,
player, and ad? I'm currently experimenting with various options, but since
it takes a couple of hours to create a particular index this is taking a
while.
# explain analyze SELECT ad_log.ad_log_id, channels.name as channel_name,
players.name as player_name, ads.name as ad_name, start_time, end_time,
(data IS NOT NULL) AS has_screenshot FROM channels, players,
players_history, ads, ad_log LEFT OUTER JOIN ad_log_screenshot USING
(ad_log_id) WHERE channel=channels.id AND player=players_history.id AND
players_history.player_instance = players.id AND ad=ads.id AND channels.name
LIKE '<some channel>' AND players.name LIKE '<some player>' AND ads.name
LIKE '<some ad>' AND date(start_time) BETWEEN '2009-01-20' AND
date('2009-01-21') ORDER BY channels.name, players.name, start_time,
ads.name LIMIT 100 OFFSET 100;
QUERY PLAN
----------------------------------------------------------------------------
----------------------------------------------------------------------------
--------------------------
Limit (cost=7425.26..7425.26 rows=1 width=120) (actual
time=1256.116..1256.202 rows=39 loops=1)
-> Sort (cost=7425.26..7425.26 rows=1 width=120) (actual
time=1255.965..1256.068 rows=139 loops=1)
Sort Key: channels.name, players.name, ad_log.start_time, ads.name
Sort Method: quicksort Memory: 35kB
-> Nested Loop Left Join (cost=0.01..7425.25 rows=1 width=120)
(actual time=179.086..1255.451 rows=139 loops=1)
-> Nested Loop (cost=0.01..7417.06 rows=1 width=88) (actual
time=137.488..1212.531 rows=139 loops=1)
Join Filter: (ad_log.channel = channels.id)
-> Nested Loop (cost=0.01..7415.73 rows=1 width=60)
(actual time=120.308..1192.867 rows=139 loops=1)
Join Filter: (players_history.id = ad_log.player)
-> Nested Loop (cost=0.00..36.92 rows=1
width=17) (actual time=21.960..23.405 rows=1 loops=1)
Join Filter: (players.id =
players_history.player_instance)
-> Seq Scan on players (cost=0.00..11.80
rows=1 width=17) (actual time=5.981..6.083 rows=1 loops=1)
Filter: (name ~~ '<some
player>'::text)
-> Seq Scan on players_history
(cost=0.00..14.50 rows=850 width=8) (actual time=15.880..16.592 rows=850
loops=1)
-> Nested Loop (cost=0.01..7371.03 rows=622
width=51) (actual time=75.161..1156.076 rows=15600 loops=1)
-> Seq Scan on ads (cost=0.00..72.79
rows=1 width=27) (actual time=15.776..31.975 rows=1 loops=1)
Filter: (name ~~ '<some ad>'::text)
-> Index Scan using ad_log_ad_date on
ad_log (cost=0.01..7267.77 rows=2438 width=32) (actual
time=59.375..1095.229 rows=15600 loops=1)
Index Cond: ((ad_log.ad = ads.id) AND
(date(ad_log.start_time) >= '2009-01-20'::date) AND (date(ad_log.start_time)
<= '2009-01-21'::date))
-> Seq Scan on channels (cost=0.00..1.31 rows=1
width=36) (actual time=0.128..0.132 rows=1 loops=139)
Filter: (channels.name ~~ '<some channel>'::text)
-> Index Scan using ad_log_screenshot_pkey on
ad_log_screenshot (cost=0.00..8.18 rows=1 width=36) (actual
time=0.304..0.304 rows=0 loops=139)
Index Cond: (ad_log.ad_log_id =
ad_log_screenshot.ad_log_id)
Total runtime: 1256.572 ms
Thanks,
--Rainer
------=_NextPart_000_0001_01C9B9E5.E4255130
Content-Type: text/html;
charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
<html xmlns:v=3D"urn:schemas-microsoft-com:vml" =
xmlns:o=3D"urn:schemas-microsoft-com:office:office" =
xmlns:w=3D"urn:schemas-microsoft-com:office:word" =
xmlns:m=3D"http://schemas.microsoft.com/office/2004/12/omml" =
xmlns=3D"http://www.w3.org/TR/REC-html40">
<head>
<meta http-equiv=3DContent-Type content=3D"text/html; =
charset=3Dus-ascii">
<meta name=3DGenerator content=3D"Microsoft Word 12 (filtered medium)">
<style>
<!--
/* Font Definitions */
@font-face
{font-family:"MS Mincho";
panose-1:2 2 6 9 4 2 5 8 3 4;}
@font-face
{font-family:"Cambria Math";
panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}
@font-face
{font-family:Consolas;
panose-1:2 11 6 9 2 2 4 3 2 4;}
@font-face
{font-family:"MS Mincho";
panose-1:2 2 6 9 4 2 5 8 3 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0in;
margin-bottom:.0001pt;
font-size:11.0pt;
font-family:"Calibri","sans-serif";}
a:link, span.MsoHyperlink
{mso-style-priority:99;
color:blue;
text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
{mso-style-priority:99;
color:purple;
text-decoration:underline;}
p.MsoPlainText, li.MsoPlainText, div.MsoPlainText
{mso-style-priority:99;
mso-style-link:"Plain Text Char";
margin:0in;
margin-bottom:.0001pt;
font-size:10.5pt;
font-family:Consolas;}
span.PlainTextChar
{mso-style-name:"Plain Text Char";
mso-style-priority:99;
mso-style-link:"Plain Text";
font-family:Consolas;}
span.EmailStyle19
{mso-style-type:personal;}
span.EmailStyle20
{mso-style-type:personal-reply;
font-family:"Calibri","sans-serif";
color:#1F497D;}
.MsoChpDefault
{mso-style-type:export-only;
font-size:10.0pt;}
@page Section1
{size:8.5in 11.0in;
margin:99.25pt 92.4pt 85.05pt 92.4pt;}
div.Section1
{page:Section1;}
-->
</style>
<!--[if gte mso 9]><xml>
<o:shapedefaults v:ext=3D"edit" spidmax=3D"1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext=3D"edit">
<o:idmap v:ext=3D"edit" data=3D"1" />
</o:shapelayout></xml><![endif]-->
</head>
<body lang=3DEN-US link=3Dblue vlink=3Dpurple>
<div class=3DSection1>
<p class=3DMsoNormal><span style=3D'color:#1F497D'>Thanks for all of the
suggestions so far. I’ve been trying to reduce the number of =
indices I have,
but I’m running into a problem. I have a need to do queries on =
this table
with criteria applied to the date and possibly any or all of the other =
key
columns. As a reminder, here’s my table:<o:p></o:p></span></p>
<p class=3DMsoNormal><span =
style=3D'color:#1F497D'><o:p> </o:p></span></p>
<p class=3DMsoNormal><span style=3D'font-family:"Courier =
New";color:#1F497D'>  =
; =
&=
nbsp;
Table "public.ad_log"<o:p></o:p></span></p>
<p class=3DMsoNormal><span style=3D'font-family:"Courier =
New";color:#1F497D'>
Column
|
Type &nb=
sp;
| =
Modifiers<o:p></o:p></span></p>
<p class=3DMsoNormal><span style=3D'font-family:"Courier =
New";color:#1F497D'>------------+-----------------------------+----------=
--------------------------------------------------<o:p></o:p></span></p>
<p class=3DMsoNormal><span style=3D'font-family:"Courier =
New";color:#1F497D'> ad_log_id
|
integer =
| not null default =
nextval('ad_log_ad_log_id_seq'::regclass)<o:p></o:p></span></p>
<p class=3DMsoNormal><span style=3D'font-family:"Courier =
New";color:#1F497D'> channel
|
integer =
| not null<o:p></o:p></span></p>
<p class=3DMsoNormal><span style=3D'font-family:"Courier =
New";color:#1F497D'> player
|
integer =
| not null<o:p></o:p></span></p>
<p class=3DMsoNormal><span style=3D'font-family:"Courier =
New";color:#1F497D'> ad &nb=
sp;
|
integer =
| not null<o:p></o:p></span></p>
<p class=3DMsoNormal><span style=3D'font-family:"Courier =
New";color:#1F497D'> start_time
| timestamp without time zone |<o:p></o:p></span></p>
<p class=3DMsoNormal><span style=3D'font-family:"Courier =
New";color:#1F497D'> end_time
| timestamp without time zone |<o:p></o:p></span></p>
<p class=3DMsoNormal><span =
style=3D'color:#1F497D'><o:p> </o:p></span></p>
<p class=3DMsoNormal><span style=3D'color:#1F497D'>So, I need indices =
that make it
fast querying against start_time as well as all possible combinations of
channel, player, and ad. Below is a sample query that uses all of these =
(note
that I’ve removed actual strings to protect customer data). The =
result is
fine in terms of speed, but since it’s using the ad_log_ad_date =
index I’m
wondering what the best strategy is to cover queries that don’t =
specify
an ad. Should I have 4 indices, one with just the start_time (for when =
no other
columns are specified) and the other three each with the start_time and =
the
three other criteria: channel, player, and ad? I’m currently
experimenting with various options, but since it takes a couple of hours =
to
create a particular index this is taking a while.<o:p></o:p></span></p>
<p class=3DMsoNormal><span =
style=3D'color:#1F497D'><o:p> </o:p></span></p>
<p class=3DMsoNormal><span style=3D'font-family:"Courier =
New";color:#1F497D'><o:p> </o:p></span></p>
<p class=3DMsoNormal><span style=3D'font-family:"Courier =
New";color:#1F497D'>#
explain analyze SELECT ad_log.ad_log_id, channels.name as channel_name,
players.name as player_name, ads.name as ad_name, start_time, end_time, =
(data
IS NOT NULL) AS has_screenshot FROM channels, players, players_history, =
ads, ad_log
LEFT OUTER JOIN ad_log_screenshot USING (ad_log_id) WHERE =
channel=3Dchannels.id
AND player=3Dplayers_history.id AND players_history.player_instance =3D =
players.id
AND ad=3Dads.id AND channels.name LIKE '<some channel>' AND =
players.name
LIKE '<some player>' AND ads.name LIKE '<some ad>' AND =
date(start_time)
BETWEEN '2009-01-20' AND date('2009-01-21') ORDER BY channels.name,
players.name, start_time, ads.name LIMIT 100 OFFSET =
100;<o:p></o:p></span></p>
<p class=3DMsoNormal><span style=3D'font-family:"Courier =
New";color:#1F497D'><o:p> </o:p></span></p>
<p class=3DMsoNormal><span style=3D'font-family:"Courier =
New";color:#1F497D'>  =
; =
&=
nbsp; &n=
bsp; &nb=
sp; &nbs=
p;  =
;
QUERY PLAN<o:p></o:p></span></p>
<p class=3DMsoNormal><span style=3D'font-family:"Courier =
New";color:#1F497D'>-----------------------------------------------------=
-------------------------------------------------------------------------=
----------------------------------------------------<o:p></o:p></span></p=
>
<p class=3DMsoNormal><span style=3D'font-family:"Courier =
New";color:#1F497D'> Limit
(cost=3D7425.26..7425.26 rows=3D1 width=3D120) (actual =
time=3D1256.116..1256.202
rows=3D39 loops=3D1)<o:p></o:p></span></p>
<p class=3DMsoNormal><span style=3D'font-family:"Courier =
New";color:#1F497D'>
-> Sort (cost=3D7425.26..7425.26 rows=3D1 width=3D120) =
(actual
time=3D1255.965..1256.068 rows=3D139 loops=3D1)<o:p></o:p></span></p>
<p class=3DMsoNormal><span style=3D'font-family:"Courier =
New";color:#1F497D'>
Sort Key: channels.name, players.name, ad_log.start_time, =
ads.name<o:p></o:p></span></p>
<p class=3DMsoNormal><span style=3D'font-family:"Courier =
New";color:#1F497D'>
Sort Method: quicksort Memory: 35kB<o:p></o:p></span></p>
<p class=3DMsoNormal><span style=3D'font-family:"Courier =
New";color:#1F497D'>
-> Nested Loop Left Join (cost=3D0.01..7425.25 rows=3D1 =
width=3D120)
(actual time=3D179.086..1255.451 rows=3D139 =
loops=3D1)<o:p></o:p></span></p>
<p class=3DMsoNormal><span style=3D'font-family:"Courier =
New";color:#1F497D'>  =
;
-> Nested Loop (cost=3D0.01..7417.06 rows=3D1 width=3D88) =
(actual
time=3D137.488..1212.531 rows=3D139 loops=3D1)<o:p></o:p></span></p>
<p class=3DMsoNormal><span style=3D'font-family:"Courier =
New";color:#1F497D'>  =
;
Join Filter: (ad_log.channel =3D channels.id)<o:p></o:p></span></p>
<p class=3DMsoNormal><span style=3D'font-family:"Courier =
New";color:#1F497D'>  =
;
-> Nested Loop (cost=3D0.01..7415.73 rows=3D1 width=3D60) =
(actual
time=3D120.308..1192.867 rows=3D139 loops=3D1)<o:p></o:p></span></p>
<p class=3DMsoNormal><span style=3D'font-family:"Courier =
New";color:#1F497D'>  =
; =
Join Filter: (players_history.id =3D =
ad_log.player)<o:p></o:p></span></p>
<p class=3DMsoNormal><span style=3D'font-family:"Courier =
New";color:#1F497D'>  =
; =
-> Nested Loop (cost=3D0.00..36.92 rows=3D1 width=3D17) =
(actual
time=3D21.960..23.405 rows=3D1 loops=3D1)<o:p></o:p></span></p>
<p class=3DMsoNormal><span style=3D'font-family:"Courier =
New";color:#1F497D'>  =
; =
Join Filter: (players.id =3D =
players_history.player_instance)<o:p></o:p></span></p>
<p class=3DMsoNormal><span style=3D'font-family:"Courier =
New";color:#1F497D'>  =
; =
-> Seq Scan on players (cost=3D0.00..11.80 rows=3D1 =
width=3D17)
(actual time=3D5.981..6.083 rows=3D1 loops=3D1)<o:p></o:p></span></p>
<p class=3DMsoNormal><span style=3D'font-family:"Courier =
New";color:#1F497D'>  =
; =
&=
nbsp;
Filter: (name ~~ '<some player>'::text)<o:p></o:p></span></p>
<p class=3DMsoNormal><span style=3D'font-family:"Courier =
New";color:#1F497D'>  =
; =
-> Seq Scan on players_history (cost=3D0.00..14.50 =
rows=3D850
width=3D8) (actual time=3D15.880..16.592 rows=3D850 =
loops=3D1)<o:p></o:p></span></p>
<p class=3DMsoNormal><span style=3D'font-family:"Courier =
New";color:#1F497D'>  =
; =
-> Nested Loop (cost=3D0.01..7371.03 rows=3D622 =
width=3D51) (actual
time=3D75.161..1156.076 rows=3D15600 loops=3D1)<o:p></o:p></span></p>
<p class=3DMsoNormal><span style=3D'font-family:"Courier =
New";color:#1F497D'>  =
; =
-> Seq Scan on ads (cost=3D0.00..72.79 rows=3D1 =
width=3D27) (actual
time=3D15.776..31.975 rows=3D1 loops=3D1)<o:p></o:p></span></p>
<p class=3DMsoNormal><span style=3D'font-family:"Courier =
New";color:#1F497D'>  =
; =
&=
nbsp;
Filter: (name ~~ '<some ad>'::text)<o:p></o:p></span></p>
<p class=3DMsoNormal><span style=3D'font-family:"Courier =
New";color:#1F497D'>  =
; =
-> Index Scan using ad_log_ad_date on ad_log =
(cost=3D0.01..7267.77
rows=3D2438 width=3D32) (actual time=3D59.375..1095.229 rows=3D15600 =
loops=3D1)<o:p></o:p></span></p>
<p class=3DMsoNormal><span style=3D'font-family:"Courier =
New";color:#1F497D'>  =
; =
&=
nbsp;
Index Cond: ((ad_log.ad =3D ads.id) AND (date(ad_log.start_time) >=3D
'2009-01-20'::date) AND (date(ad_log.start_time) <=3D =
'2009-01-21'::date))<o:p></o:p></span></p>
<p class=3DMsoNormal><span style=3D'font-family:"Courier =
New";color:#1F497D'>  =
;
-> Seq Scan on channels (cost=3D0.00..1.31 rows=3D1 =
width=3D36)
(actual time=3D0.128..0.132 rows=3D1 loops=3D139)<o:p></o:p></span></p>
<p class=3DMsoNormal><span style=3D'font-family:"Courier =
New";color:#1F497D'>  =
; =
Filter: (channels.name ~~ '<some =
channel>'::text)<o:p></o:p></span></p>
<p class=3DMsoNormal><span style=3D'font-family:"Courier =
New";color:#1F497D'>  =
;
-> Index Scan using ad_log_screenshot_pkey on =
ad_log_screenshot
(cost=3D0.00..8.18 rows=3D1 width=3D36) (actual time=3D0.304..0.304 =
rows=3D0 loops=3D139)<o:p></o:p></span></p>
<p class=3DMsoNormal><span style=3D'font-family:"Courier =
New";color:#1F497D'>  =
;
Index Cond: (ad_log.ad_log_id =3D =
ad_log_screenshot.ad_log_id)<o:p></o:p></span></p>
<p class=3DMsoNormal><span style=3D'font-family:"Courier =
New";color:#1F497D'> Total
runtime: 1256.572 ms<o:p></o:p></span></p>
<p class=3DMsoNormal><span =
style=3D'color:#1F497D'><o:p> </o:p></span></p>
<p class=3DMsoNormal><span =
style=3D'color:#1F497D'><o:p> </o:p></span></p>
<p class=3DMsoNormal><span =
style=3D'color:#1F497D'><o:p> </o:p></span></p>
<p class=3DMsoNormal><span =
style=3D'color:#1F497D'>Thanks,<o:p></o:p></span></p>
<p class=3DMsoNormal><span =
style=3D'color:#1F497D'><o:p> </o:p></span></p>
<p class=3DMsoNormal><span =
style=3D'color:#1F497D'>--Rainer<o:p></o:p></span></p>
</div>
</body>
</html>
------=_NextPart_000_0001_01C9B9E5.E4255130--
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2009-04-10 05:20:45 | Re: difficulties with time based queries |
Previous Message | Glenn Maynard | 2009-04-09 23:42:41 | Re: Nested query performance issue |