Re: Proposal for disk quota feature

From: Hubert Zhang <hzhang(at)pivotal(dot)io>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: Chapman Flack <chap(at)anastigmatix(dot)net>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Proposal for disk quota feature
Date: 2018-09-21 14:21:16
Message-ID: CAB0yren2VwEDPfPWwch==C8Xv2z4EzxWdhD4xJQvs-yoKtKK6g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

>
> just fast reaction - why QUOTA object?
> Isn't ALTER SET enough?
> Some like
> ALTER TABLE a1 SET quote = 1MB;
> ALTER USER ...
> ALTER SCHEMA ..
> New DDL commans looks like too hard hammer .

It's an option. Prefer to consider quota setting store together:
CREATE DISK QUOTA way is more nature to store quota setting in a separate
pg_diskquota catalog
While ALTER SET way is more close to store quota setting in pg_class,
pg_role, pg_namespace. etc in an integrated way.
(Note that here I mean nature/close is not must, ALTER SET could also store
in pg_diskquota and vice versa.)

Here are some differences I can think of:
1 pg_role is a global catalog, not per database level. It's harder to
tracker the user's disk usage in the whole clusters(considering 1000+
databases). So the semantic of CREATE DISK QUOTA ON USER is limited: it
only tracks the user's disk usage inside the current database.
2 using separate pg_diskquota could add more field except for quota limit
without adding too many fields in pg_class, e.g. red zone to give the user
a warning or the current disk usage of the db objects.

On Fri, Sep 21, 2018 at 8:01 PM Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
wrote:

>
>
> pá 21. 9. 2018 v 13:32 odesílatel Hubert Zhang <hzhang(at)pivotal(dot)io> napsal:
>
>>
>>
>>
>>
>> *Hi all,We redesign disk quota feature based on the comments from Pavel
>> Stehule and Chapman Flack. Here are the new design.OverviewBasically, disk
>> quota feature is used to support multi-tenancy environment, different level
>> of database objects could be set a quota limit to avoid over use of disk
>> space. A common case could be as follows: DBA could enable disk quota on a
>> specified database list. DBA could set disk quota limit for
>> tables/schemas/roles in these databases. Separate disk quota worker process
>> will monitor the disk usage for these objects and detect the objects which
>> exceed their quota limit. Queries loading data into these “out of disk
>> quota” tables/schemas/roles will be cancelled.We are currently working at
>> init implementation stage. We would like to propose our idea firstly and
>> get feedbacks from community to do quick iteration.SQL Syntax (How to use
>> disk quota)1 Specify the databases with disk quota enabled in GUC
>> “diskquota_databases” in postgresql.conf and restart the database.2 DBA
>> could set disk quota limit for table/schema/role.CREATE DISK QUOTA tablea1
>> ON TABLE a1 with (quota = ‘1MB’);CREATE DISK QUOTA roleu1 ON USER u1 with
>> (quota = ‘1GB’);CREATE DISK QUOTA schemas1 ON SCHEMA s1 with (quota =
>> ‘3MB’);*
>>
>
> just fast reaction - why QUOTA object?
>
> Isn't ALTER SET enough?
>
> Some like
>
> ALTER TABLE a1 SET quote = 1MB;
> ALTER USER ...
> ALTER SCHEMA ..
>
> New DDL commans looks like too hard hammer .
>
>
>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> *3 Simulate a schema out of quota limit case: suppose table a1 and table
>> a2 are both under schema s1.INSERT INTO a1 SELECT
>> generate_series(1,1000);INSERT INTO a2 SELECT
>> generate_series(1,3000000);SELECT pg_sleep(5)INSERT INTO a1 SELECT
>> generate_series(1,1000);ERROR: schema's disk space quota exceededDROP
>> TABLE a2;SELECT pg_sleep(5)INSERT INTO a1 SELECT
>> generate_series(1,1000);INSERT 0 1000ArchitectureDisk quota has the
>> following components.1. Quota Setting Store is where the disk quota setting
>> to be stored and accessed. We plan to use catalog table pg_diskquota to
>> store these information. pg_diskquota is
>> like:CATALOG(pg_diskquota,6122,DiskQuotaRelationId){ NameData quotaname; /*
>> diskquota name */ int16 quotatype; /* diskquota type name */ Oid
>> quotatargetoid; /* diskquota target db object oid*/ int32 quotalimit; /*
>> diskquota size limit in MB*/ int32 quotaredzone; /* diskquota redzone in
>> MB*/} FormData_pg_diskquota;2. Quota Change Detector is the monitor of size
>> change of database objects. We plan to use stat collector to detect the
>> ‘active’ table list at initial stage. But stat collector has some
>> limitation on finding the active table which is in a running transaction.
>> Details see TODO section.3. Quota Size Checker is where to calculate the
>> size and compare with quota limit for database objects. According to
>> Pavel’s comment, autovacuum launcher and worker process could be a good
>> reference to disk quota. So we plan to use a disk quota launcher daemon
>> process and several disk quota worker process to finish this work. Launcher
>> process is responsible for starting worker process based on a user defined
>> database list from GUC. Worker process will connect to its target database
>> and monitor the disk usage for objects in this database. In init stage of
>> worker process, it will call calculate_total_relation_size() to calculate
>> the size for each user table. After init stage, worker process will refresh
>> the disk model every N seconds. Refreshing will only recalculate the size
>> of tables in ‘active’ table list, which is generated by Quata Change
>> Detector to minimize the cost.4. Quota Enforcement Operator is where to
>> check for the quota limitation at postgres backend side. We will firstly
>> implement it in ExecCheckRTPerms() as pre-running enforcement. It will
>> check the disk quota of tables being inserted or updated, and report error
>> if table’s or table’s schema’s or table’s owner’s quota limit is exceeded.
>> As a native feature, we plan to add more checkpoint to do running query
>> enforcement. For example, if a disk quota lefts 10MB quota, a query could
>> insert 1GB data. This query could be allowed in pre-running enforcement
>> check, but will be cancelled in running query enforcement check. Therefore,
>> it can improve the accurate of disk quota usage. To achieve this, we plan
>> to add a checkpoint in lower API such as smgr_extened. Hence, the Quota
>> Enforcement Operator will check the disk quota usage when smgr_extened is
>> called. If the quota is over limited, current query will be cancelled.
>> Highlight1. Native feature.Support native Create/Drop Disk Quota SQL
>> statement.New catalog table pg_diskquota to store disk quota setting.2.
>> Auto DML/DDL detection. Table
>> create/update/insert/delete/vacuum/truncate/drop/schema_change/owner_change,
>> Schema create/drop and Role create/drop will be detected by disk quota
>> automatically. 3. Low cost disk quota checker.Worker process of disk quota
>> need to refresh the disk usage model every N seconds. Since recalculate the
>> file size using stat() system call is expensive for a large number of
>> files, we use an ‘active’ table list to reduce the real work at each
>> iteration. A basic experiment on our init stage implementation on database
>> with 20K tables shows that the refresh cost is 1% cpu usage and will be
>> finished within 50ms. Todo/LimitationBefore we propose our patch, we plan
>> to enhance it with the following ideas:1. Setting database list with disk
>> quota enabled dynamically without restart database. Since we have the disk
>> quota launcher process, it could detect the new ‘diskquota_databases’ list
>> and start/stop the corresponding disk quota worker process.2. Enforcement
>> when query is running. Considering the case when there is 10MB quota left,
>> but next query will insert 10GB data. Current enforcement design will allow
>> this query to be executed. This is limited by the ‘active’ table detection
>> is generated by stat collector. Postgres backend will only send table stat
>> information to collector only when the transaction ends. We need a new way
>> to detect the ‘active’ table even when this table is being modified inside
>> a running transaction.3. Monitor unlimited number of databases. Current we
>> set the max number of disk quota worker process to be 10 to reduce the
>> affection normal workload. But how about if we want to monitor the disk
>> quota of more than 10 databases? Our solution is to let disk quota launcher
>> to manage a queue of database need to be monitored. And disk quota worker
>> process consuming the queue and refresh the disk usage/quota for this
>> database. After some periods, worker will return the database to the queue,
>> and fetch the top database from queue to process. The period determine the
>> delay of detecting disk quota change. To implement this feature, we need to
>> support a subprocess of postmaster to rebind to another database instead of
>> the database binded in InitPostgres().4. Support active table detection on
>> vacuum full and vacuum analyze. Currently vacuum full and vacuum analyze
>> are not tracked by stat collector.Thanks to Heikki, Pavel Stehule,Chapman
>> Flack for the former comments on disk quota feature. Any comments on how to
>> improve disk quota feature are appreciated.*
>>
>>
>> On Mon, Sep 3, 2018 at 12:05 PM, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
>> wrote:
>>
>>>
>>>
>>> 2018-09-03 3:49 GMT+02:00 Hubert Zhang <hzhang(at)pivotal(dot)io>:
>>>
>>>> Thanks Pavel.
>>>> Your patch did enforcement on storage level(md.c or we could also use
>>>> smgr_extend). It's straight forward.
>>>> But I prefer to implement disk_quota as a feature with following
>>>> objectives:
>>>> 1 set/alter disk quota setting on different database objects, e.g.
>>>> user, database, schema etc. not only a general GUC, but we could set
>>>> separate quota limit for a specific objects.
>>>> 2 enforcement operator should work at two positions: before query is
>>>> running and when query is running. The latter one's implementation maybe
>>>> similar to your patch.
>>>>
>>>
>>> The patch was just example. The resource quotes should be more complex -
>>> per partition, table, schema, database, user - so GUC are possible, but not
>>> very user friendly.
>>>
>>> Our case is specific, but not too much. The servers are used for
>>> multidimensional analyses - and some tables can grow too fast (COPY, INSERT
>>> SELECT). We need to solve limits immediately. The implementation is simple,
>>> so I did it. Same implementation on database level, or schema level needs
>>> some more locks, so it will not be too effective. The resource management
>>> can be complex very complex, and I expect so it will be hard work.
>>>
>>> Regards
>>>
>>> Pavel
>>>
>>>
>>>> On Sun, Sep 2, 2018 at 8:44 PM, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
>>>> wrote:
>>>>
>>>>> Hi
>>>>>
>>>>> 2018-09-02 14:18 GMT+02:00 Hubert Zhang <hzhang(at)pivotal(dot)io>:
>>>>>
>>>>>> Thanks Chapman.
>>>>>> @Pavel, could you please explain more about your second suggestion "implement
>>>>>> some quotas on storage level?"
>>>>>>
>>>>>
>>>>> See attached patch - it is very simple - and good enough for our
>>>>> purposes.
>>>>>
>>>>> Regards
>>>>>
>>>>> Pavel
>>>>>
>>>>>
>>>>>
>>>>>> We will not keep the long-lived processes attach to all
>>>>>> databases(just like you mentioned servers with thousands of databases
>>>>>> )
>>>>>> And you are right, we could share ideas with autovacuum process, fork
>>>>>> worker processes in need.
>>>>>> "autovacuum checks for tables that have had a large number of
>>>>>> inserted, updated or deleted tuples. These checks use the statistics
>>>>>> collection facility"
>>>>>> diskquota process is similar to autovacuum at caring about insert,
>>>>>> but the difference is that it also care about vucuum full, truncate and
>>>>>> drop. While update and delete may not be interested since no file change
>>>>>> happens. So a separate diskquota process is preferred.
>>>>>>
>>>>>> So if we implemented disk quota as a full native feature, and in the
>>>>>> first initial version I prefer to implement the following features:
>>>>>> 1 Fork diskquota launcher process under Postmaster serverloop, which
>>>>>> is long-lived.
>>>>>> 2 Diskquota launcher process is responsible for creating diskquota
>>>>>> worker process for every database.
>>>>>> 3 DIskquota setting is stored in a separate catalog table for each
>>>>>> database.
>>>>>> 4 Initialization stage, Diskquota launcher process creates diskquota worker
>>>>>> process for all the databases(traverse like autovacuum). Worker process
>>>>>> calculates disk usage of db objects and their diskquota setting. If any
>>>>>> db object exceeds its quota limit, put them into the blacklist in the
>>>>>> shared memory, which will later be used by enforcement operator. Worker
>>>>>> process exits when works are done.
>>>>>> 5 Running stage, Diskquota launcher process creates diskquota worker
>>>>>> process for the database with a large number of insert, copy, truncate,
>>>>>> drop etc. or create disk quota statement. Worker process updates the file
>>>>>> size for db objects containing the result relation, and compare with the
>>>>>> diskquota setting. Again, if exceeds quota limit, put them into blacklist,
>>>>>> remove from blacklist vice versa. Worker process exits when works
>>>>>> are done and a GUC could control the frequency of worker process restart to
>>>>>> a specific database. As you know, this GUC also controls the delay when we
>>>>>> do enforcement.
>>>>>> 6 Enforcement. When postgres backend executes queries, check the
>>>>>> blacklist in shared memory to determine whether the query is allowed(before
>>>>>> execute) or need rollback(is executing)?
>>>>>>
>>>>>> If we implemented disk quota as an extension, we could just use
>>>>>> background worker to start diskquota launcher process and use
>>>>>> RegisterDynamicBackgroundWorker() to fork child diskquota worker
>>>>>> processes by the launcher process as suggested by @Chapman.
>>>>>> Diskquota setting could be stored in user table in a separate schema for
>>>>>> each database(Schema and table created by create extension statement) just
>>>>>> like what Heikki has done in pg_quota project. But in this case, we need to
>>>>>> create extension for each database before diskquota worker process can be
>>>>>> set up for that database.
>>>>>>
>>>>>> Any comments on the above design and which is preferred, native
>>>>>> feature or extension as the POC?
>>>>>>
>>>>>>
>>>>>> -- Hubert
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Fri, Aug 31, 2018 at 3:32 AM, Pavel Stehule <
>>>>>> pavel(dot)stehule(at)gmail(dot)com> wrote:
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> 2018-08-30 16:22 GMT+02:00 Chapman Flack <chap(at)anastigmatix(dot)net>:
>>>>>>>
>>>>>>>> On 08/30/2018 09:57 AM, Hubert Zhang wrote:
>>>>>>>>
>>>>>>>> > 2 Keep one worker process for each database. But using a
>>>>>>>> parent/global
>>>>>>>> > quota worker process to manage the lifecycle of database level
>>>>>>>> worker
>>>>>>>> > processes. It could handle the newly created database(avoid
>>>>>>>> restart
>>>>>>>> > database) and save resource when a database is not used. But this
>>>>>>>> needs to
>>>>>>>> > change worker process to be hierarchical. Postmaster becomes the
>>>>>>>> grandfather
>>>>>>>> > of database level worker processes in this case.
>>>>>>>>
>>>>>>>> I am using background workers this way in 9.5 at $work.
>>>>>>>>
>>>>>>>> In my case, one worker lives forever, wakes up on a set period, and
>>>>>>>> starts a short-lived worker for every database, waiting for each
>>>>>>>> one before starting the next.
>>>>>>>>
>>>>>>>> It was straightforward to implement. Looking back over the code,
>>>>>>>> I see the global worker assigns its own PID to worker.bgw_notify_pid
>>>>>>>> of each of its children, and also obtains a handle for each child
>>>>>>>> from RegisterDynamicBackgroundWorker().
>>>>>>>>
>>>>>>>> I imagine the global quota worker would prefer to start workers
>>>>>>>> for every database and then just wait for notifications from any
>>>>>>>> of them, but that seems equally straightforward at first glance.
>>>>>>>>
>>>>>>>
>>>>>>> There are servers with thousands databases. Worker per database is
>>>>>>> not good idea.
>>>>>>>
>>>>>>> It should to share ideas, code with autovacuum process.
>>>>>>>
>>>>>>> Not sure, how to effective implementation based on bg workers can
>>>>>>> be. On servers with large set of databases, large set of tables it can
>>>>>>> identify too big table too late.
>>>>>>>
>>>>>>> Isn't better to implement some quotas on storage level?
>>>>>>>
>>>>>>> Regards
>>>>>>>
>>>>>>> Pavel
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> -Chap
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Thanks
>>>>>>
>>>>>> Hubert Zhang
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Thanks
>>>>
>>>> Hubert Zhang
>>>>
>>>
>>>
>>
>>
>> --
>> Thanks
>>
>> Hubert Zhang
>>
>

--
Thanks

Hubert Zhang

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Mark Dilger 2018-09-21 15:28:53 Re: FETCH FIRST clause PERCENT option
Previous Message Thomas Munro 2018-09-21 13:55:01 Re: [HACKERS] SERIALIZABLE on standby servers