From: | Andrew Okhmat <andy(at)supportex(dot)net> |
---|---|
To: | Thomas Simpson <ts(at)talentstack(dot)to>, pgsql-performance(at)postgresql(dot)org |
Subject: | Re: Row level security |
Date: | 2024-06-27 10:45:04 |
Message-ID: | d094a87d-9d63-46c9-8c27-631f881b80fb@supportex.net |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
Hello Jim,
Your approach of using table inheritance in PostgreSQL for implementing
row-level security (RLS) has some interesting aspects, but there are
potential pitfalls and alternatives that you should consider. Below,
I'll outline some key points to
Table Inheritance and Performance Concerns
* RLS and Inheritance - In PostgreSQL, RLS policies are applied per
table. If you use inheritance, RLS policies defined on the parent table
won’t automatically apply to the child tables. You’ll have to set up RLS
policies on each child table separately.
* Growing Base Table - The base table, getting a new row for every row
inserted in the child tables, will grow really fast. Managing a table
with hundreds of millions of rows per year could become a serious
performance problem.
* Partitioning - Partitioning can help manage big tables by breaking
them into smaller parts. But if your base table becomes a bottleneck,
partitioning the child tables alone might not solve the problem.
Alternative Approach: Use Partitioned Tables Directly with RLS
Given your needs, here's a different approach that leverages
PostgreSQL's partitioning and indexing features along with RLS:
* Directly Partitioned Tables - Instead of inheritance, create
partitioned tables directly for each type of data. Partition these
tables based on a logical key (like time, site ID, or customer ID) so
each partition stays manageable:
Example:
CREATE TABLE data (
id SERIAL PRIMARY KEY,
site_id INT,
customer_id INT,
division_id INT,
department_id INT,
data_payload JSONB,
created_at TIMESTAMPTZ
) PARTITION BY RANGE (created_at);
* RLS Policies on Partitions - Set up RLS policies on each partition.
Since partitions are smaller, RLS policy checks should be more efficient.
Example:
CREATE POLICY rls_policy ON data
USING (site_id = current_setting('app.current_site_id')::INT);
ENABLE ROW LEVEL SECURITY;
* Session Variables - Using PostgreSQL session variables to store
user-specific info (like /app.current_site_id/) is convenient, but has
potential security risks. If a client can set these variables, they
could manipulate them to gain unauthorized access. To mitigate this,
ensure that only trusted parts of your application can set these
variables. Consider using server-side functions or application logic to
securely set these variables based on the authenticated user's information.
Example:
SET app.current_site_id = '123';
* Indexing - Make sure you index columns used in RLS policies and
queries, like /site_id/ and /customer_id/.
CREATE INDEX idx_site_id ON data (site_id);
CREATE INDEX idx_customer_id ON data (customer_id);
CREATE INDEX idx_site_customer ON data (site_id, customer_id);
CREATE INDEX idx_created_at ON data (created_at);
* Using Stored Procedures - using stored procedures can centralize
security logic, but it can also add complexity. Here's a brief look:
Advantages:
- Centralized security logic.
- Additional layer of security as logic is hidden from end-users.
- Can include data validation and business logic.
2Disadvantages:
- Increased complexity in development and maintenance.
- Potential performance overhead for complex procedures.
- Less flexibility for ad-hoc queries.
Example:
CREATE OR REPLACE FUNCTION insert_data(
p_site_id INT,
p_customer_id INT,
p_division_id INT,
p_department_id INT,
p_data_payload JSONB,
p_created_at TIMESTAMPTZ
) RETURNS VOID AS $$
BEGIN
IF current_setting('app.current_site_id')::INT = p_site_id THEN
INSERT INTO data (site_id, customer_id, division_id,
department_id, data_payload, created_at)
VALUES (p_site_id, p_customer_id, p_division_id,
p_department_id, p_data_payload, p_created_at);
ELSE
RAISE EXCEPTION 'Access Denied';
END IF;
END;
$$ LANGUAGE plpgsql;
* Final Thoughts
Using direct partitioning and applying RLS policies to each partition
should help with performance issues linked to a growing base table. This
approach also keeps things flexible for future expansions and avoids the
hassle of managing inheritance hierarchies. Proper indexing with RLS
policies in mind can greatly improve query performance in large tables.
Just make sure to handle session variables securely to avoid potential
security issues.
If you have more questions or need further advice on implementation,
just let me know!
Cheers,
Andy
On 25-Jun-24 00:28, Thomas Simpson wrote:
>
> Hi,
>
> I'm trying to implement a system which requires row level security on
> some key data tables (most do not require RLS). The data tables will
> grow substantially (rows likely > +100M/year - the system is > 80%
> data insert plus < 20% updates and by design, no deletes).
>
> Some queries are likely to brush past many rows before being
> eliminated by the RLS policy, so I'm trying to find the most efficient
> way that does not compromise query times. I also want to have a
> unified approach across all the RLS data to make the policy
> implementation as straightforward as possible too because I know there
> will be future expansion of the RLS rules.
>
> My thought currently is that table inheritance could possibly be one
> way forward. Specifically the base table holding just the RLS
> attributes, such as site group, site ID, customer group, customer ID
> as some initial examples (I expect company division, department may be
> future needs too).
>
> With the RLS attributes on the base table, I can add future needs to
> that table and they automatically propagate to the child tables
> holding the RLS data. Policies on the child tables can enforce row
> visibility based on session tokens assigned at login (a future problem
> avoided just now for simplicity).
>
> I have a small prototype working, with the policy function comparing
> the columns (from the base table) to the user tokens to permit/deny
> row access. This allows this to be as in-memory and hopefully as fast
> as possible as it avoids needing to do any lookups to other tables or
> anything more expensive than some 'permissionColumn IN
> listOfTokensHeldByTheSession' checks.
>
> My concern is the base table will grow substantially faster than the
> child data tables as that receives a new row for every row inserted in
> any of the child tables, so could easily be +300M rows/year and this
> could become some performance fence. Some of the child tables have a
> clear partition key available so inherited & partitioned is also
> appealing but could possibly amplify any performance issue further.
>
> Does this approach sound viable or are there pitfalls or a different
> more recommended approach?
>
> Thanks
>
> Jim
>
>
>
>
>
From | Date | Subject | |
---|---|---|---|
Next Message | Achilleas Mantzios - cloud | 2024-06-27 13:27:54 | Re: Inconsistent query performance based on relation hit frequency |
Previous Message | Andrei Lepikhov | 2024-06-27 10:31:36 | Re: Inconsistent query performance based on relation hit frequency |