From: | Riaan Stander <rstander(at)exa(dot)co(dot)za> |
---|---|
To: | pgsql-performance(at)postgresql(dot)org |
Subject: | Re: Best design for performance |
Date: | 2017-03-28 12:41:37 |
Message-ID: | 113e6bde-6595-44b8-79a4-992a2eea4c27@exa.co.za |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
On 28 Mar 2017 4:22 AM, Claudio Freire wrote:
>> From: Claudio Freire [mailto:klaussfreire(at)gmail(dot)com]
>>
>> How did you query the table's size? You're probably failing to account for TOAST tables.
>>
>> I'd suggest using pg_total_relation_size.
> ...
> On Mon, Mar 27, 2017 at 10:17 PM, Riaan Stander <rstander(at)exa(dot)co(dot)za> wrote:
>> I'm using the first query from here.
>> https://wiki.postgresql.org/wiki/Disk_Usage
> Please don't top post.
>
> It's a surprisingly big difference. TOAST could be compressing the
> array, but I wouldn't expect it to be that compressible. Do you have
> any stats about the length of the site array per row?
>
>> The plan is to do the rights checking in the application. The join solution gets used for reports to filter data & client adhoc queries.
> Especially for reporting queries, you want the planner's stats to be
> as accurate as possible, and placing a literal sites arrays in the
> query in my experience is the best way to achieve that. But that is
> indeed limited to reasonably small arrays, thereby the need to have
> both variants to adapt the query to each case.
>
> If you can't afford to do that change at the application level, I
> would expect that the original schema without the array should be
> superior. The array hides useful information from the planner, and
> that *should* hurt you.
>
> You'll have to test with a reasonably large data set, resembling a
> production data set as much as possible.
>
>
I did some more testing on this. My primary concern that not all the
data was there in the array version,but after doing some extensive
testing all seems to be there.
I've done some comparisons vs the SQL Server version too.
SQL Sever Table with over 700mil records:
CREATE TABLE [dbo].[usrUserRights] (
[UserId] [dbo].[dm_Id] NOT NULL,
[SiteId] [dbo].[dm_Id] NOT NULL,
[RightId] [dbo].[dm_Id] NOT NULL,
CONSTRAINT [pk_usrUserRights_UserId_RightId_SiteId] PRIMARY KEY
CLUSTERED([UserId],[RightId],[SiteId])
);
Takes23GB for data and 200MB for indexes.
Postgres table with over 700mil records:
CREATE TABLE security.user_right_site2
(
user_id bigint NOT NULL,
right_id bigint NOT NULL,
site_id bigint NOT NULL
);
create index on security.user_right_site2(user_id, right_id);
Takes 35GB data and 26GB index, for a total of 61GB.
That is quite a large increase over SQL Server storage. Am I missing
something? Makes me worry about the rest of the database we still have
to convert.
Postgres Array version ends up with only 600k records, due to aggregation:
CREATE TABLE security.user_right_site
(
user_id bigint NOT NULL,
right_id bigint NOT NULL,
sites bigint[]
);
create index on security.user_right_site(user_id, right_id);
Takes 339Mb data, 25Mb index and 2240Mb TOAST
Regarding the Array length for each of these. They currently have max
6500 site ids.
Regards
Riaan
From | Date | Subject | |
---|---|---|---|
Next Message | Claudio Freire | 2017-03-28 17:15:49 | Re: Best design for performance |
Previous Message | Claudio Freire | 2017-03-28 02:22:54 | Re: Best design for performance |