Re: Proposal for disk quota feature

From: Hubert Zhang <hzhang(at)pivotal(dot)io>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: Chapman Flack <chap(at)anastigmatix(dot)net>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Proposal for disk quota feature
Date: 2018-09-21 11:32:16
Message-ID: CAB0yrem5nhKJ7y3tCNJn+Ug_qwzjsrNByn5PX2jZuRO9LbVp+A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

*Hi all,We redesign disk quota feature based on the comments from Pavel
Stehule and Chapman Flack. Here are the new design.OverviewBasically, disk
quota feature is used to support multi-tenancy environment, different level
of database objects could be set a quota limit to avoid over use of disk
space. A common case could be as follows: DBA could enable disk quota on a
specified database list. DBA could set disk quota limit for
tables/schemas/roles in these databases. Separate disk quota worker process
will monitor the disk usage for these objects and detect the objects which
exceed their quota limit. Queries loading data into these “out of disk
quota” tables/schemas/roles will be cancelled.We are currently working at
init implementation stage. We would like to propose our idea firstly and
get feedbacks from community to do quick iteration.SQL Syntax (How to use
disk quota)1 Specify the databases with disk quota enabled in GUC
“diskquota_databases” in postgresql.conf and restart the database.2 DBA
could set disk quota limit for table/schema/role.CREATE DISK QUOTA tablea1
ON TABLE a1 with (quota = ‘1MB’);CREATE DISK QUOTA roleu1 ON USER u1 with
(quota = ‘1GB’);CREATE DISK QUOTA schemas1 ON SCHEMA s1 with (quota =
‘3MB’);3 Simulate a schema out of quota limit case: suppose table a1 and
table a2 are both under schema s1.INSERT INTO a1 SELECT
generate_series(1,1000);INSERT INTO a2 SELECT
generate_series(1,3000000);SELECT pg_sleep(5)INSERT INTO a1 SELECT
generate_series(1,1000);ERROR: schema's disk space quota exceededDROP
TABLE a2;SELECT pg_sleep(5)INSERT INTO a1 SELECT
generate_series(1,1000);INSERT 0 1000ArchitectureDisk quota has the
following components.1. Quota Setting Store is where the disk quota setting
to be stored and accessed. We plan to use catalog table pg_diskquota to
store these information. pg_diskquota is
like:CATALOG(pg_diskquota,6122,DiskQuotaRelationId){ NameData quotaname; /*
diskquota name */ int16 quotatype; /* diskquota type name */ Oid
quotatargetoid; /* diskquota target db object oid*/ int32 quotalimit; /*
diskquota size limit in MB*/ int32 quotaredzone; /* diskquota redzone in
MB*/} FormData_pg_diskquota;2. Quota Change Detector is the monitor of size
change of database objects. We plan to use stat collector to detect the
‘active’ table list at initial stage. But stat collector has some
limitation on finding the active table which is in a running transaction.
Details see TODO section.3. Quota Size Checker is where to calculate the
size and compare with quota limit for database objects. According to
Pavel’s comment, autovacuum launcher and worker process could be a good
reference to disk quota. So we plan to use a disk quota launcher daemon
process and several disk quota worker process to finish this work. Launcher
process is responsible for starting worker process based on a user defined
database list from GUC. Worker process will connect to its target database
and monitor the disk usage for objects in this database. In init stage of
worker process, it will call calculate_total_relation_size() to calculate
the size for each user table. After init stage, worker process will refresh
the disk model every N seconds. Refreshing will only recalculate the size
of tables in ‘active’ table list, which is generated by Quata Change
Detector to minimize the cost.4. Quota Enforcement Operator is where to
check for the quota limitation at postgres backend side. We will firstly
implement it in ExecCheckRTPerms() as pre-running enforcement. It will
check the disk quota of tables being inserted or updated, and report error
if table’s or table’s schema’s or table’s owner’s quota limit is exceeded.
As a native feature, we plan to add more checkpoint to do running query
enforcement. For example, if a disk quota lefts 10MB quota, a query could
insert 1GB data. This query could be allowed in pre-running enforcement
check, but will be cancelled in running query enforcement check. Therefore,
it can improve the accurate of disk quota usage. To achieve this, we plan
to add a checkpoint in lower API such as smgr_extened. Hence, the Quota
Enforcement Operator will check the disk quota usage when smgr_extened is
called. If the quota is over limited, current query will be cancelled.
Highlight1. Native feature.Support native Create/Drop Disk Quota SQL
statement.New catalog table pg_diskquota to store disk quota setting.2.
Auto DML/DDL detection. Table
create/update/insert/delete/vacuum/truncate/drop/schema_change/owner_change,
Schema create/drop and Role create/drop will be detected by disk quota
automatically. 3. Low cost disk quota checker.Worker process of disk quota
need to refresh the disk usage model every N seconds. Since recalculate the
file size using stat() system call is expensive for a large number of
files, we use an ‘active’ table list to reduce the real work at each
iteration. A basic experiment on our init stage implementation on database
with 20K tables shows that the refresh cost is 1% cpu usage and will be
finished within 50ms. Todo/LimitationBefore we propose our patch, we plan
to enhance it with the following ideas:1. Setting database list with disk
quota enabled dynamically without restart database. Since we have the disk
quota launcher process, it could detect the new ‘diskquota_databases’ list
and start/stop the corresponding disk quota worker process.2. Enforcement
when query is running. Considering the case when there is 10MB quota left,
but next query will insert 10GB data. Current enforcement design will allow
this query to be executed. This is limited by the ‘active’ table detection
is generated by stat collector. Postgres backend will only send table stat
information to collector only when the transaction ends. We need a new way
to detect the ‘active’ table even when this table is being modified inside
a running transaction.3. Monitor unlimited number of databases. Current we
set the max number of disk quota worker process to be 10 to reduce the
affection normal workload. But how about if we want to monitor the disk
quota of more than 10 databases? Our solution is to let disk quota launcher
to manage a queue of database need to be monitored. And disk quota worker
process consuming the queue and refresh the disk usage/quota for this
database. After some periods, worker will return the database to the queue,
and fetch the top database from queue to process. The period determine the
delay of detecting disk quota change. To implement this feature, we need to
support a subprocess of postmaster to rebind to another database instead of
the database binded in InitPostgres().4. Support active table detection on
vacuum full and vacuum analyze. Currently vacuum full and vacuum analyze
are not tracked by stat collector.Thanks to Heikki, Pavel Stehule,Chapman
Flack for the former comments on disk quota feature. Any comments on how to
improve disk quota feature are appreciated.*

On Mon, Sep 3, 2018 at 12:05 PM, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
wrote:

>
>
> 2018-09-03 3:49 GMT+02:00 Hubert Zhang <hzhang(at)pivotal(dot)io>:
>
>> Thanks Pavel.
>> Your patch did enforcement on storage level(md.c or we could also use
>> smgr_extend). It's straight forward.
>> But I prefer to implement disk_quota as a feature with following
>> objectives:
>> 1 set/alter disk quota setting on different database objects, e.g. user,
>> database, schema etc. not only a general GUC, but we could set separate
>> quota limit for a specific objects.
>> 2 enforcement operator should work at two positions: before query is
>> running and when query is running. The latter one's implementation maybe
>> similar to your patch.
>>
>
> The patch was just example. The resource quotes should be more complex -
> per partition, table, schema, database, user - so GUC are possible, but not
> very user friendly.
>
> Our case is specific, but not too much. The servers are used for
> multidimensional analyses - and some tables can grow too fast (COPY, INSERT
> SELECT). We need to solve limits immediately. The implementation is simple,
> so I did it. Same implementation on database level, or schema level needs
> some more locks, so it will not be too effective. The resource management
> can be complex very complex, and I expect so it will be hard work.
>
> Regards
>
> Pavel
>
>
>> On Sun, Sep 2, 2018 at 8:44 PM, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
>> wrote:
>>
>>> Hi
>>>
>>> 2018-09-02 14:18 GMT+02:00 Hubert Zhang <hzhang(at)pivotal(dot)io>:
>>>
>>>> Thanks Chapman.
>>>> @Pavel, could you please explain more about your second suggestion "implement
>>>> some quotas on storage level?"
>>>>
>>>
>>> See attached patch - it is very simple - and good enough for our
>>> purposes.
>>>
>>> Regards
>>>
>>> Pavel
>>>
>>>
>>>
>>>> We will not keep the long-lived processes attach to all databases(just
>>>> like you mentioned servers with thousands of databases)
>>>> And you are right, we could share ideas with autovacuum process, fork
>>>> worker processes in need.
>>>> "autovacuum checks for tables that have had a large number of
>>>> inserted, updated or deleted tuples. These checks use the statistics
>>>> collection facility"
>>>> diskquota process is similar to autovacuum at caring about insert, but
>>>> the difference is that it also care about vucuum full, truncate and drop.
>>>> While update and delete may not be interested since no file change happens.
>>>> So a separate diskquota process is preferred.
>>>>
>>>> So if we implemented disk quota as a full native feature, and in the
>>>> first initial version I prefer to implement the following features:
>>>> 1 Fork diskquota launcher process under Postmaster serverloop, which is
>>>> long-lived.
>>>> 2 Diskquota launcher process is responsible for creating diskquota
>>>> worker process for every database.
>>>> 3 DIskquota setting is stored in a separate catalog table for each
>>>> database.
>>>> 4 Initialization stage, Diskquota launcher process creates diskquota worker
>>>> process for all the databases(traverse like autovacuum). Worker process
>>>> calculates disk usage of db objects and their diskquota setting. If any
>>>> db object exceeds its quota limit, put them into the blacklist in the
>>>> shared memory, which will later be used by enforcement operator. Worker
>>>> process exits when works are done.
>>>> 5 Running stage, Diskquota launcher process creates diskquota worker
>>>> process for the database with a large number of insert, copy, truncate,
>>>> drop etc. or create disk quota statement. Worker process updates the file
>>>> size for db objects containing the result relation, and compare with the
>>>> diskquota setting. Again, if exceeds quota limit, put them into blacklist,
>>>> remove from blacklist vice versa. Worker process exits when works are
>>>> done and a GUC could control the frequency of worker process restart to a
>>>> specific database. As you know, this GUC also controls the delay when we do
>>>> enforcement.
>>>> 6 Enforcement. When postgres backend executes queries, check the
>>>> blacklist in shared memory to determine whether the query is allowed(before
>>>> execute) or need rollback(is executing)?
>>>>
>>>> If we implemented disk quota as an extension, we could just use
>>>> background worker to start diskquota launcher process and use
>>>> RegisterDynamicBackgroundWorker() to fork child diskquota worker
>>>> processes by the launcher process as suggested by @Chapman.
>>>> Diskquota setting could be stored in user table in a separate schema for
>>>> each database(Schema and table created by create extension statement) just
>>>> like what Heikki has done in pg_quota project. But in this case, we need to
>>>> create extension for each database before diskquota worker process can be
>>>> set up for that database.
>>>>
>>>> Any comments on the above design and which is preferred, native feature
>>>> or extension as the POC?
>>>>
>>>>
>>>> -- Hubert
>>>>
>>>>
>>>>
>>>> On Fri, Aug 31, 2018 at 3:32 AM, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com
>>>> > wrote:
>>>>
>>>>>
>>>>>
>>>>> 2018-08-30 16:22 GMT+02:00 Chapman Flack <chap(at)anastigmatix(dot)net>:
>>>>>
>>>>>> On 08/30/2018 09:57 AM, Hubert Zhang wrote:
>>>>>>
>>>>>> > 2 Keep one worker process for each database. But using a
>>>>>> parent/global
>>>>>> > quota worker process to manage the lifecycle of database level
>>>>>> worker
>>>>>> > processes. It could handle the newly created database(avoid restart
>>>>>> > database) and save resource when a database is not used. But this
>>>>>> needs to
>>>>>> > change worker process to be hierarchical. Postmaster becomes the
>>>>>> grandfather
>>>>>> > of database level worker processes in this case.
>>>>>>
>>>>>> I am using background workers this way in 9.5 at $work.
>>>>>>
>>>>>> In my case, one worker lives forever, wakes up on a set period, and
>>>>>> starts a short-lived worker for every database, waiting for each
>>>>>> one before starting the next.
>>>>>>
>>>>>> It was straightforward to implement. Looking back over the code,
>>>>>> I see the global worker assigns its own PID to worker.bgw_notify_pid
>>>>>> of each of its children, and also obtains a handle for each child
>>>>>> from RegisterDynamicBackgroundWorker().
>>>>>>
>>>>>> I imagine the global quota worker would prefer to start workers
>>>>>> for every database and then just wait for notifications from any
>>>>>> of them, but that seems equally straightforward at first glance.
>>>>>>
>>>>>
>>>>> There are servers with thousands databases. Worker per database is not
>>>>> good idea.
>>>>>
>>>>> It should to share ideas, code with autovacuum process.
>>>>>
>>>>> Not sure, how to effective implementation based on bg workers can be.
>>>>> On servers with large set of databases, large set of tables it can identify
>>>>> too big table too late.
>>>>>
>>>>> Isn't better to implement some quotas on storage level?
>>>>>
>>>>> Regards
>>>>>
>>>>> Pavel
>>>>>
>>>>>
>>>>>
>>>>>> -Chap
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Thanks
>>>>
>>>> Hubert Zhang
>>>>
>>>
>>>
>>
>>
>> --
>> Thanks
>>
>> Hubert Zhang
>>
>
>

--
Thanks

Hubert Zhang

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alexander Kukushkin 2018-09-21 11:43:28 Re: Connection slots reserved for replication
Previous Message Pavel Stehule 2018-09-21 11:31:57 Re: proposal: prefix function