From: | Laurent Martelli <laurent(at)aopsys(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | pgsql-performance(at)postgresql(dot)org |
Subject: | Re: Query involving views |
Date: | 2004-06-06 21:12:07 |
Message-ID: | 87llj0tmqg.fsf@stan.aopsys.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
>>>>> "Tom" == Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:
Tom> Laurent Martelli <laurent(at)aopsys(dot)com> writes:
>> Now, if I use the following view to abstract access rights:
>> CREATE VIEW userpictures (
>> PictureID,RollID,FrameID,Description,Filename,
>> Owner,EntryDate,Date,
>> NbClick,NbRates,MaxRate,MinRate,AverageRate,SumRates, UserID) AS
>> SELECT DISTINCT ON (Permissions.PictureID,UserID)
>> Pictures.PictureID,RollID,FrameID,Description,Filename,Owner,
>> EntryDate,Date,NbClick,NbRates,MaxRate,MinRate,AverageRate,SumRates,
>> UserID FROM Permissions JOIN Groupsdef using (GroupID) JOIN
>> pictures using (PictureID);
>> [ performance sucks ]
Tom> Find a way to get rid of the DISTINCT ON. That's essentially
Tom> an optimization fence. Worse, the way you are using it here,
Tom> it doesn't even give well-defined results, since there's no
Tom> ORDER BY constraining which row will be selected out of a set
Tom> of duplicates. (I think it may not matter to you, since you
Tom> don't really care which groupsdef row is selected,
That's true. I do not use columns from groupsdef in the end.
Tom> but in general a view constructed like this is broken.)
Tom> It might work to do the view as
Tom> SELECT ... all that stuff ... FROM pictures p, users u WHERE
Tom> EXISTS (SELECT 1 FROM permissions prm, groupsdef g WHERE
Tom> p.pictureid = prm.pictureid AND prm.groupid = g.groupid AND
Tom> g.userid = u.userid);
Tom> I'm not sure offhand about the performance properties of this
Tom> either, but it would be worth trying.
This one does not yield very good performance. In fact, the best
performances I have is when I use a where clause like this one:
WHERE PictureID IN
(SELECT PictureID FROM permissions JOIN groupsdef USING(GroupID)
WHERE groupsdef.UserID=2)
But it's not as elegant to write as the initial view using "distinct
on". I could create a view like this:
CREATE VIEW userpictures (PictureID,UserID)
AS SELECT pictureid,userid
FROM permissions JOIN groupsdef USING(GroupID)
and then do queries like this:
SELECT * FROM pictures
WHERE PictureID IN (SELECT PictureID FROM userpictures WHERE UserID=2)
but it's stillnot as elegant as
SELECT * FROM userpictures WHERE UserID=2
I think I'll try a function:
CREATE FUNCTION picturesID(int) RETURNS SETOF int AS '
SELECT PictureID FROM permissions JOIN groupsdef USING(GroupID)
WHERE groupsdef.UserID=$1
' LANGUAGE sql;
SELECT * FROM pictures WHERE PictureID IN (select * from picturesID(2));
Here's something funny: using a function seems gives slihtly better results
than inlining the query (I did a dozen of runs and the timings were consistent):
SELECT * FROM pictures WHERE PictureID IN (select * from picturesID(2));
QUERY PLAN
Hash Join (cost=15.50..100.49 rows=200 width=97) (actual time=28.609..46.568 rows=2906 loops=1)
Hash Cond: ("outer".pictureid = "inner".picturesid)
-> Seq Scan on pictures (cost=0.00..68.33 rows=2933 width=97) (actual time=0.018..2.610 rows=2933 loops=1)
-> Hash (cost=15.00..15.00 rows=200 width=4) (actual time=28.467..28.467 rows=0 loops=1)
-> HashAggregate (cost=15.00..15.00 rows=200 width=4) (actual time=23.698..26.201 rows=2906 loops=1)
-> Function Scan on picturesid (cost=0.00..12.50 rows=1000 width=4) (actual time=16.202..19.952 rows=5076 loops=1)
Total runtime: 48.601 ms
SELECT * FROM pictures WHERE PictureID IN (
SELECT PictureID FROM permissions JOIN groupsdef USING(GroupID)
WHERE groupsdef.UserID=2);
QUERY PLAN
Hash Join (cost=394.93..504.24 rows=2632 width=97) (actual time=35.770..53.574 rows=2906 loops=1)
Hash Cond: ("outer".pictureid = "inner".pictureid)
-> Seq Scan on pictures (cost=0.00..68.33 rows=2933 width=97) (actual time=0.014..2.543 rows=2933 loops=1)
-> Hash (cost=388.35..388.35 rows=2632 width=4) (actual time=35.626..35.626 rows=0 loops=1)
-> HashAggregate (cost=388.35..388.35 rows=2632 width=4) (actual time=30.988..33.502 rows=2906 loops=1)
-> Merge Join (cost=5.40..376.72 rows=4652 width=4) (actual time=0.247..26.628 rows=5076 loops=1)
Merge Cond: ("outer".groupid = "inner".groupid)
-> Index Scan using permissions_groupid_key on permissions (cost=0.00..280.77 rows=8305 width=8) (actual time=0.031..11.629 rows=7633 loops=1)
-> Sort (cost=5.40..5.43 rows=12 width=4) (actual time=0.207..1.720 rows=5078 loops=1)
Sort Key: groupsdef.groupid
-> Seq Scan on groupsdef (cost=0.00..5.19 rows=12 width=4) (actual time=0.030..0.182 rows=11 loops=1)
Filter: (userid = 2)
Total runtime: 54.748 ms
Tom> A cruder answer is just to accept that the view may give you
Tom> multiple hits, and put the DISTINCT in the top-level query.
I thought of that. But it has the drawback that if you use an ORDER
BY, you must have the same columns in the DISTINCT.
Tom> I think though that in the long run you're going to need to
Tom> rethink this representation of permissions. It's nice and
Tom> simple but it's not going to scale well. Even your "fast"
Tom> query is going to look like a dog once you get to many
Tom> thousands of permission entries.
Tom> It might work to maintain a derived table (basically a
Tom> materialized view) of the form (userid, groupid, pictureid)
Tom> signifying that a user can access a picture through membership
Tom> in a group. Put a nonunique index on (userid, pictureid) on
Tom> it. This could then drive the EXISTS test efficiently.
I'll probably do that if perf goes down when the database grows
bigger.
Thanks for all the advice.
Best regards,
Laurent
--
Laurent Martelli
laurent(at)aopsys(dot)com Java Aspect Components
http://www.aopsys.com/ http://jac.objectweb.org
From | Date | Subject | |
---|---|---|---|
Next Message | Markus Schaber | 2004-06-06 22:08:00 | Re: Using a COPY...FROM through JDBC? |
Previous Message | Tom Lane | 2004-06-06 16:22:25 | Re: Query involving views |