From: | "Riaan Stander" <rstander(at)exa(dot)co(dot)za> |
---|---|
To: | <pgsql-performance(at)postgresql(dot)org> |
Subject: | Best design for performance |
Date: | 2017-03-27 23:43:37 |
Message-ID: | 04c401d2a753$f4b78e00$de26aa00$@exa.co.za |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
Good day
At my company we're busy converting a product from using SQL Server to
Postgres. One part of the old design involves filtering data for the rights
a user has.
The SQL Server table looked like this:
CREATE TABLE [dbo].[usrUserRights] (
[UserId] [dbo].[dm_Id] NOT NULL,
[SiteId] [dbo].[dm_Id] NOT NULL,
[RightId] [dbo].[dm_Id] NOT NULL,
CONSTRAINT [pk_usrUserRights_UserId_RightId_SiteId] PRIMARY KEY
CLUSTERED([UserId],[RightId],[SiteId])
);
All data in other tables would have a SiteId. Users would be assigned rights
for certain Sites. We would then be able to filter data with a join.
Example:
SELECT Id, Code FROM SomeTable st
JOIN usrUserRights ur ON st.SiteId = ur.SiteId AND ur.UserId = @UserId AND
ur.RightId = @TheRightRequired
The one design flaw with this is that the table gets extremely large. At our
largest client this table contains over 700mil records. For a single user
with lots of rights there could be 7mil records to cover their rights.
In Postgres I was thinking of going with a design like this
CREATE TABLE security.user_right_site
(
user_id bigint NOT NULL,
right_id bigint NOT NULL,
sites bigint[]
);
create index on security.user_right_site(user_id, right_id);
This drastically cut down on the number of records in the table. It also
seems to make a massive change to the storage requirements.
The old design requires 61GB vs 2.6GB.
My one concern is regarding the limitations of the array type in Postgres.
Is there a point at which one should not use it? Currently our largest
client has 6000+ sites, meaning that the array would contain that many
items. What would the upper feasible limit be in Postgres?
Regarding queries to filter data against this table in Postgres. Any advice
for the best method. I've done some testing myself, but just want to know if
there are other alternatives.
Attempt 1, using Any (250ms)
select a.id, a.code, a.description from ara.asset a
join security.user_right_site urs on urs.user_id = 1783 and urs.right_id
= 10000 and a.site_id = any(urs.sites)
where a.is_historical = true;
QUERY PLAN
----------------------------------------------------------------------------
--------------------------------------------------
Nested Loop (cost=1000.42..22712.71 rows=4 width=47)
Join Filter: (a.site_id = ANY (urs.sites))
-> Gather (cost=1000.00..22599.49 rows=4191 width=55)
Workers Planned: 3
-> Parallel Seq Scan on asset a (cost=0.00..21180.39 rows=1352
width=55)
Filter: is_historical
-> Materialize (cost=0.42..8.45 rows=1 width=530)
-> Index Scan using user_right_site_user_id_right_id_idx on
user_right_site urs (cost=0.42..8.45 rows=1 width=530)
Index Cond: ((user_id = 1783) AND (right_id = 10000))
(9 rows)
Attempt 2, using CTE (65ms)
with sites as
(
select unnest(sites) AS site_id from security.user_right_site where
user_id = 1783 and right_id = 10000
)
select a.id, a.code, a.description from ara.asset a
join sites s on a.site_id = s.site_id
where a.is_historical = true;
QUERY PLAN
----------------------------------------------------------------------------
------------------------------------------
Hash Join (cost=1012.19..22628.68 rows=41 width=47)
Hash Cond: (a.site_id = s.site_id)
CTE sites
-> Index Scan using user_right_site_user_id_right_id_idx on
user_right_site (cost=0.42..8.94 rows=100 width=8)
Index Cond: ((user_id = 1783) AND (right_id = 10000))
-> Gather (cost=1000.00..22599.49 rows=4191 width=55)
Workers Planned: 3
-> Parallel Seq Scan on asset a (cost=0.00..21180.39 rows=1352
width=55)
Filter: is_historical
-> Hash (cost=2.00..2.00 rows=100 width=8)
-> CTE Scan on sites s (cost=0.00..2.00 rows=100 width=8)
(11 rows)
Attempt 3, using sub select (65ms)
select a.id, a.code, a.description from
(select unnest(sites) AS site_id from security.user_right_site where user_id
= 1783 and right_id = 10000) sites
join ara.asset a on sites.site_id = a.site_id
where a.is_historical = true;
QUERY PLAN
----------------------------------------------------------------------------
----------------------------------------------------
Gather (cost=1011.19..22209.86 rows=128 width=47)
Workers Planned: 3
-> Hash Join (cost=11.19..21197.06 rows=41 width=47)
Hash Cond: (a.site_id = (unnest(user_right_site.sites)))
-> Parallel Seq Scan on asset a (cost=0.00..21180.39 rows=1352
width=55)
Filter: is_historical
-> Hash (cost=9.94..9.94 rows=100 width=8)
-> Index Scan using user_right_site_user_id_right_id_idx on
user_right_site (cost=0.42..8.94 rows=100 width=8)
Index Cond: ((user_id = 1783) AND (right_id = 10000))
(9 rows)
Regards
Riaan Stander
From | Date | Subject | |
---|---|---|---|
Next Message | Claudio Freire | 2017-03-28 00:42:04 | Re: Best design for performance |
Previous Message | Merlin Moncure | 2017-03-27 13:27:37 | Re: Postgres not using all RAM (Huge Page activated on a 96GB RAM system) |