directory tree query with big planner variation

From: Axel Rau <Axel(dot)Rau(at)Chaos1(dot)DE>
To: pgsql-performance(at)postgresql(dot)org
Subject: directory tree query with big planner variation
Date: 2006-07-31 10:48:11
Message-ID: 79BB9B3D-BCE5-46AA-82DF-BB6B1D0FE2A7@Chaos1.DE
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hi group,

this is a directory tree query for a backup system (http://
sourceforge.net/projects/bacula)
You provide a path and get back the names of the children plus a
boolean telling if the child has itself children.
The "%@" stands for the initial path:
---------------------------------------------------------------
explain analyze SELECT X.name AS name, COUNT(CH) > 1 AS children
FROM
( SELECT RTRIM( REPLACE( NLPC.path, '%@/', ''),'/') AS name,
FN.name AS CH
FROM
( SELECT P.path,P.pathid
FROM bacula.path P
WHERE P.path ~ '^%@/[^/]*/$' ) AS NLPC
LEFT OUTER JOIN bacula.file F
ON
NLPC.pathid = F.pathid
LEFT OUTER JOIN bacula.filename FN
ON
F.filenameid = FN.filenameid
GROUP BY NLPC.path, FN.name
UNION
SELECT FN.name AS name, FN.name AS CH
FROM
bacula.path P, bacula.file F, bacula.filename FN
WHERE
P.path = '%@/' AND
P.pathid = F.pathid AND
F.filenameid = FN.filenameid
) AS X
WHERE X.name <> ''
GROUP BY X.name
---------------------------------------------------------------
The 1st part of the union takes care of directories, the 2nd one of
flat files.
Application allows user navigation in a browser (clicking on a name
in one column triggers the query and fills the next browser column).
Initial path of "/Users/axel/Library/Preferences" results in:
---------------------------------------------------------------
Sort (cost=1295.24..1295.47 rows=92 width=64) (actual
time=818.987..819.871 rows=527 loops=1)
Sort Key: upper(x.name)
-> GroupAggregate (cost=1288.56..1292.24 rows=92 width=64)
(actual time=784.069..814.059 rows=527 loops=1)
-> Unique (cost=1288.56..1289.25 rows=92 width=112)
(actual time=783.931..809.708 rows=684 loops=1)
-> Sort (cost=1288.56..1288.79 rows=92 width=112)
(actual time=783.921..793.150 rows=5350 loops=1)
Sort Key: name, ch
-> Append (cost=642.03..1285.55 rows=92
width=112) (actual time=335.134..723.917 rows=5350 loops=1)
-> Subquery Scan "*SELECT*
1" (cost=642.03..643.18 rows=46 width=112) (actual
time=335.130..338.564 rows=184 loops=1)
-> HashAggregate
(cost=642.03..642.72 rows=46 width=112) (actual time=335.121..337.843
rows=184 loops=1)
-> Nested Loop Left Join
(cost=2.00..641.80 rows=46 width=112) (actual time=39.293..326.831
rows=1685 loops=1)
-> Nested Loop Left
Join (cost=0.00..502.63 rows=46 width=97) (actual
time=21.026..202.206 rows=1685 loops=1)
-> Index Scan
using path_name_idx on path p (cost=0.00..3.02 rows=1 width=97)
(actual time=15.480..56.935 rows=27 loops=1)
Index Cond:
((path >= '/Users/axel/Library/Preferences/'::text) AND (path < '/
Users/axel/Library/Preferences0'::text))
Filter:
((path ~ '^/Users/axel/Library/Preferences/[^/]*/$'::text) AND (rtrim
("replace"(path, '/Users/axel/Library/Preferences/'::text, ''::text),
'/'::text) <> ''::text))
-> Index Scan
using file_path_idx on file f (cost=0.00..493.28 rows=506 width=8)
(actual time=0.473..5.119 rows=62 loops=27)
Index Cond:
("outer".pathid = f.pathid)
-> Bitmap Heap Scan on
filename fn (cost=2.00..3.01 rows=1 width=23) (actual
time=0.058..0.061 rows=1 loops=1685)
Recheck Cond:
("outer".filenameid = fn.filenameid)
-> Bitmap Index
Scan on filename_pkey (cost=0.00..2.00 rows=1 width=0) (actual
time=0.030..0.030 rows=1 loops=1685)
Index Cond:
("outer".filenameid = fn.filenameid)
-> Nested Loop (cost=2.00..641.91
rows=46 width=19) (actual time=3.349..377.758 rows=5166 loops=1)
-> Nested Loop (cost=0.00..502.62
rows=46 width=4) (actual time=3.118..97.375 rows=5200 loops=1)
-> Index Scan using
path_name_idx on path p (cost=0.00..3.01 rows=1 width=4) (actual
time=0.045..0.052 rows=1 loops=1)
Index Cond: (path = '/
Users/axel/Library/Preferences/'::text)
-> Index Scan using
file_path_idx on file f (cost=0.00..493.28 rows=506 width=8) (actual
time=3.058..76.014 rows=5200 loops=1)
Index Cond:
("outer".pathid = f.pathid)
-> Bitmap Heap Scan on filename
fn (cost=2.00..3.02 rows=1 width=23) (actual time=0.037..0.039
rows=1 loops=5200)
Recheck Cond:
("outer".filenameid = fn.filenameid)
Filter: (name <> ''::text)
-> Bitmap Index Scan on
filename_pkey (cost=0.00..2.00 rows=1 width=0) (actual
time=0.018..0.018 rows=1 loops=5200)
Index Cond:
("outer".filenameid = fn.filenameid)
Total runtime: 832.458 ms
---------------------------------------------------------------
which is ok, but initial path of "/Users/axel" results in (which is
not ok):
---------------------------------------------------------------
Sort (cost=125533.67..125534.17 rows=200 width=64) (actual
time=84273.963..84274.260 rows=181 loops=1)
Sort Key: upper(x.name)
-> GroupAggregate (cost=123493.01..125526.03 rows=200 width=64)
(actual time=84263.411..84272.427 rows=181 loops=1)
-> Unique (cost=123493.01..124169.51 rows=90201
width=112) (actual time=84263.215..84270.129 rows=522 loops=1)
-> Sort (cost=123493.01..123718.51 rows=90201
width=112) (actual time=84263.208..84265.632 rows=1432 loops=1)
Sort Key: name, ch
-> Append (cost=113172.83..116069.08
rows=90201 width=112) (actual time=83795.274..84251.660 rows=1432
loops=1)
-> Subquery Scan "*SELECT*
1" (cost=113172.83..115426.71 rows=90155 width=112) (actual
time=83795.270..83803.996 rows=410 loops=1)
-> HashAggregate
(cost=113172.83..114525.16 rows=90155 width=112) (actual
time=83795.258..83802.369 rows=410 loops=1)
-> Hash Left Join
(cost=3124.38..112722.06 rows=90155 width=112) (actual
time=56254.547..83779.903 rows=3648 loops=1)
Hash Cond:
("outer".filenameid = "inner".filenameid)
-> Merge Left Join
(cost=0.00..106216.87 rows=90155 width=97) (actual
time=54926.198..82430.621 rows=3648 loops=1)
Merge Cond:
("outer".pathid = "inner".pathid)
-> Index Scan
using path_pkey on path p (cost=0.00..2567.57 rows=1941 width=97)
(actual time=527.805..1521.911 rows=69 loops=1)
Filter:
((path ~ '^/Users/axel/[^/]*/$'::text) AND (rtrim("replace"(path, '/
Users/axel/'::text, ''::text), '/'::text) <> ''::text))
-> Index Scan
using file_path_idx on file f (cost=0.00..95191.99 rows=3020363
width=8) (actual time=17.561..74392.318 rows=2941790 loops=1)
-> Hash
(cost=2723.30..2723.30 rows=160430 width=23) (actual
time=1299.103..1299.103 rows=160430 loops=1)
-> Seq Scan on
filename fn (cost=0.00..2723.30 rows=160430 width=23) (actual
time=3.884..684.918 rows=160430 loops=1)
-> Nested Loop (cost=2.00..641.91
rows=46 width=19) (actual time=93.252..442.196 rows=1022 loops=1)
-> Nested Loop (cost=0.00..502.62
rows=46 width=4) (actual time=49.375..209.694 rows=1050 loops=1)
-> Index Scan using
path_name_idx on path p (cost=0.00..3.01 rows=1 width=4) (actual
time=29.455..29.462 rows=1 loops=1)
Index Cond: (path = '/
Users/axel/'::text)
-> Index Scan using
file_path_idx on file f (cost=0.00..493.28 rows=506 width=8) (actual
time=19.898..175.869 rows=1050 loops=1)
Index Cond:
("outer".pathid = f.pathid)
-> Bitmap Heap Scan on filename
fn (cost=2.00..3.02 rows=1 width=23) (actual time=0.206..0.208
rows=1 loops=1050)
Recheck Cond:
("outer".filenameid = fn.filenameid)
Filter: (name <> ''::text)
-> Bitmap Index Scan on
filename_pkey (cost=0.00..2.00 rows=1 width=0) (actual
time=0.087..0.087 rows=1 loops=1050)
Index Cond:
("outer".filenameid = fn.filenameid)
Total runtime: 84295.927 ms
---------------------------------------------------------------
It happened once that the planner resolved the 2nd query with the 1st
plan, but this is not reproducible.
How can I avoid the 2nd plan?

This is 8.1.4 on OpenBSD 3.9 with 2x1GHz PIII and 2GB.
Axel
Axel Rau, ☀Frankfurt , Germany +49-69-951418-0

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Michael Stone 2006-07-31 11:15:42 Re: directory tree query with big planner variation
Previous Message Richard Huxton 2006-07-31 09:26:55 Re: Query 200x slower on server [PART 2]