From: | Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> |
---|---|
To: | Hubert Zhang <hzhang(at)pivotal(dot)io> |
Cc: | Chapman Flack <chap(at)anastigmatix(dot)net>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Proposal for disk quota feature |
Date: | 2018-09-02 12:44:40 |
Message-ID: | CAFj8pRD7cWt6YOSemNJOd=t=irJp+U=dTjijxJY+s_FmdioXVA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi
2018-09-02 14:18 GMT+02:00 Hubert Zhang <hzhang(at)pivotal(dot)io>:
> Thanks Chapman.
> @Pavel, could you please explain more about your second suggestion "implement
> some quotas on storage level?"
>
See attached patch - it is very simple - and good enough for our purposes.
Regards
Pavel
> We will not keep the long-lived processes attach to all databases(just
> like you mentioned servers with thousands of databases)
> And you are right, we could share ideas with autovacuum process, fork
> worker processes in need.
> "autovacuum checks for tables that have had a large number of inserted,
> updated or deleted tuples. These checks use the statistics collection
> facility"
> diskquota process is similar to autovacuum at caring about insert, but the
> difference is that it also care about vucuum full, truncate and drop. While
> update and delete may not be interested since no file change happens. So a
> separate diskquota process is preferred.
>
> So if we implemented disk quota as a full native feature, and in the
> first initial version I prefer to implement the following features:
> 1 Fork diskquota launcher process under Postmaster serverloop, which is
> long-lived.
> 2 Diskquota launcher process is responsible for creating diskquota worker
> process for every database.
> 3 DIskquota setting is stored in a separate catalog table for each
> database.
> 4 Initialization stage, Diskquota launcher process creates diskquota worker
> process for all the databases(traverse like autovacuum). Worker process
> calculates disk usage of db objects and their diskquota setting. If any
> db object exceeds its quota limit, put them into the blacklist in the
> shared memory, which will later be used by enforcement operator. Worker
> process exits when works are done.
> 5 Running stage, Diskquota launcher process creates diskquota worker
> process for the database with a large number of insert, copy, truncate,
> drop etc. or create disk quota statement. Worker process updates the file
> size for db objects containing the result relation, and compare with the
> diskquota setting. Again, if exceeds quota limit, put them into blacklist,
> remove from blacklist vice versa. Worker process exits when works are
> done and a GUC could control the frequency of worker process restart to a
> specific database. As you know, this GUC also controls the delay when we do
> enforcement.
> 6 Enforcement. When postgres backend executes queries, check the blacklist
> in shared memory to determine whether the query is allowed(before execute)
> or need rollback(is executing)?
>
> If we implemented disk quota as an extension, we could just use background
> worker to start diskquota launcher process and use
> RegisterDynamicBackgroundWorker() to fork child diskquota worker
> processes by the launcher process as suggested by @Chapman.
> Diskquota setting could be stored in user table in a separate schema for
> each database(Schema and table created by create extension statement) just
> like what Heikki has done in pg_quota project. But in this case, we need to
> create extension for each database before diskquota worker process can be
> set up for that database.
>
> Any comments on the above design and which is preferred, native feature or
> extension as the POC?
>
>
> -- Hubert
>
>
>
> On Fri, Aug 31, 2018 at 3:32 AM, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
> wrote:
>
>>
>>
>> 2018-08-30 16:22 GMT+02:00 Chapman Flack <chap(at)anastigmatix(dot)net>:
>>
>>> On 08/30/2018 09:57 AM, Hubert Zhang wrote:
>>>
>>> > 2 Keep one worker process for each database. But using a parent/global
>>> > quota worker process to manage the lifecycle of database level worker
>>> > processes. It could handle the newly created database(avoid restart
>>> > database) and save resource when a database is not used. But this
>>> needs to
>>> > change worker process to be hierarchical. Postmaster becomes the
>>> grandfather
>>> > of database level worker processes in this case.
>>>
>>> I am using background workers this way in 9.5 at $work.
>>>
>>> In my case, one worker lives forever, wakes up on a set period, and
>>> starts a short-lived worker for every database, waiting for each
>>> one before starting the next.
>>>
>>> It was straightforward to implement. Looking back over the code,
>>> I see the global worker assigns its own PID to worker.bgw_notify_pid
>>> of each of its children, and also obtains a handle for each child
>>> from RegisterDynamicBackgroundWorker().
>>>
>>> I imagine the global quota worker would prefer to start workers
>>> for every database and then just wait for notifications from any
>>> of them, but that seems equally straightforward at first glance.
>>>
>>
>> There are servers with thousands databases. Worker per database is not
>> good idea.
>>
>> It should to share ideas, code with autovacuum process.
>>
>> Not sure, how to effective implementation based on bg workers can be. On
>> servers with large set of databases, large set of tables it can identify
>> too big table too late.
>>
>> Isn't better to implement some quotas on storage level?
>>
>> Regards
>>
>> Pavel
>>
>>
>>
>>> -Chap
>>>
>>>
>>
>
>
> --
> Thanks
>
> Hubert Zhang
>
Attachment | Content-Type | Size |
---|---|---|
relation_limit.patch | text/x-patch | 3.9 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Fabien COELHO | 2018-09-02 16:27:29 | Re[3]: Adding a note to protocol.sgml regarding CopyData |
Previous Message | Hubert Zhang | 2018-09-02 12:18:56 | Re: Proposal for disk quota feature |