Re: Background worker assistance & review

From: Keith Fiske <keith(at)omniti(dot)com>
To: Craig Ringer <craig(at)2ndquadrant(dot)com>
Cc: PGSQL Mailing List <pgsql-general(at)postgresql(dot)org>
Subject: Re: Background worker assistance & review
Date: 2015-04-21 23:12:15
Message-ID: CAG1_KcCFua7xKzQEQ_4k76+_t281gDpKS5WYuVe4F+Z8=vgMLQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

<http://www.keithf4.com>

On Tue, Apr 21, 2015 at 5:47 PM, Keith Fiske <keith(at)omniti(dot)com> wrote:

>
>
>
> <http://www.keithf4.com>
>
> On Fri, Apr 10, 2015 at 1:00 PM, Keith Fiske <keith(at)omniti(dot)com> wrote:
>
>>
>> On Thu, Apr 9, 2015 at 11:56 PM, Craig Ringer <craig(at)2ndquadrant(dot)com>
>> wrote:
>>
>>>
>>>
>>> On 9 April 2015 at 05:35, Keith Fiske <keith(at)omniti(dot)com> wrote:
>>>
>>>> I'm working on a background worker (BGW) for my pg_partman extension.
>>>> I've gotten the basics of it working for my first round, but there's two
>>>> features I'm missing that I'd like to add before release:
>>>>
>>>> 1) Only allow one instance of this BGW to run
>>>>
>>>
>>> Load your extension in shared_preload_libraries, so that _PG_init runs
>>> in the postmaster. Register a static background worker then.
>>>
>>> If you need one worker per database (because it needs to access the DB)
>>> this won't work for you, though. What we do in BDR is have a single static
>>> background worker that's launched by the postmaster, which then launches
>>> and terminates per-database workers that do the "real work".
>>>
>>> Because of a limitation in the bgworker API in releases 9.4 and older,
>>> the static worker has to connect to a database if it wants to access shared
>>> catalogs like pg_database. This limitation has been lifted in 9.5 though,
>>> along with the need to use the database name instead of its oid to connect
>>> (which left bgworkers unable to handle RENAME DATABASE).
>>>
>>> (We still really need a hook on CREATE DATABASE too)
>>>
>>> 2) Create a bgw_terminate_partman() function to stop it more intuitively
>>>> than doing a pg_cancel_backend() on the PID
>>>>
>>>
>>> If you want it to be able to be started/stopped dynamically, you should
>>> probably use RequestAddinShmemSpace to allocate a small shared memory
>>> block. Use that to register the PGPROC for the current worker when the
>>> worker starts, and add a boolean field you can use to ask it to terminate
>>> its self. You'll also need a LWLock to protect access to the segment, so
>>> you don't have races between a worker starting and the user asking to
>>> cancel it, etc.
>>>
>>> Unfortunately the BackgroundWorkerHandle struct is opaque, so you cannot
>>> store it in shared memory when it's returned by
>>> RegisterDynamicBackgroundWorker() and use it to later check the worker's
>>> status or ask it to exit. You have to use regular backend manipulation
>>> functions and PGPROC instead.
>>>
>>> Personally, I suggest that you leave the worker as a static worker, and
>>> leave it always running when the extension is active. If it isn't doing
>>> anything, have it sleep on its latch, then set its latch from other
>>> processes when something interesting happens. (You can put the process
>>> latch from PGPROC into your shmem segment so you can set it from elsewhere,
>>> or allocate a new latch).
>>>
>>> This is my first venture into writing C code for postgres, so I'm not
>>>> familiar with a lot of the internals yet. I read
>>>> http://www.postgresql.org/docs/9.4/static/bgworker.html and I see it
>>>> mentioning how you can check the status of a BGW launched dynamically and
>>>> the function to terminate one, but I'm not clear how how you can get the
>>>> information on a currently running BGW to do these things.
>>>>
>>>
>>> You can't. It's a pretty significant limitation in the current API.
>>> There's no way to enumerate bgworkers via the bgworker API, only via PGPROC.
>>>
>>>
>>>> I used the worker_spi example for a lot of this, so if there's any
>>>> additional guidance for a better way to do what I've done, I'd appreciate
>>>> it. All I really have it doing now is calling the run_maintenance()
>>>> function at a defined interval and don't need it doing more than that yet.
>>>> <http://www.keithf4.com>
>>>>
>>>
>>> The BDR project has an extension with much more in-depth use of
>>> background workers, but it's probably *too* complicated. We have a static
>>> bgworker that launches and terminates dynamic bgworkers (per-database) that
>>> in turn launch and terminate more dynamic background workers
>>> (per-connection to peer databases).
>>>
>>> If you're interested, all the code is mirrored on github:
>>>
>>> https://github.com/2ndquadrant/bdr/tree/bdr-plugin/next
>>>
>>> and the relevant parts are:
>>>
>>> https://github.com/2ndQuadrant/bdr/blob/bdr-plugin/next/bdr.c#L640
>>> https://github.com/2ndQuadrant/bdr/blob/bdr-plugin/next/bdr_perdb.c
>>> https://github.com/2ndQuadrant/bdr/blob/bdr-plugin/next/bdr_supervisor.c
>>> https://github.com/2ndQuadrant/bdr/blob/bdr-plugin/next/bdr_shmem.c
>>> https://github.com/2ndQuadrant/bdr/blob/bdr-plugin/next/bdr_apply.c#L2401
>>> https://github.com/2ndQuadrant/bdr/blob/bdr-plugin/next/bdr.h
>>>
>>> ... but there's a *lot* of code there.
>>>
>>> --
>>> Craig Ringer http://www.2ndQuadrant.com/
>>> PostgreSQL Development, 24x7 Support, Training & Services
>>>
>>
>>
>> Craig,
>>
>> Thanks for the response! Definitely cleared up a lot of questions I had
>> regarding how to interact with currently running BGWs. Glad to know I can
>> at least stop banging my head against the desk about that. I've still got a
>> lot to learn as far as how to interact with shared memory, but now that I
>> know that's the path I have to go down, I'm fine with that.
>>
>> My current plan now after your response this this:
>>
>> - Statically launch master BGW with shared_preload_libraries
>> - Use dynamically launched BGW for each database that pg_partman will run
>> on in the cluster. My previous idea of restricting it to one BGW would
>> likely have stopped it from ever working on more than one database in a
>> cluster.
>> - Will see if I can create a function that polls the cluster for
>> currently existing databases that actually have pg_partman installed. This
>> should eliminate the need for a GUC naming the databases to run for. Should
>> allow handling if a database is renamed as well. This way, as soon as the
>> extension is created on a database, it should hopefully "just work" and
>> start managing it.
>>
>> 9.4 is my targeted release to support for a while, so I'll just have to
>> deal with the shortcomings you mentioned. Does the above sound like it
>> could work then?
>>
>> --
>> Keith Fiske
>> Database Administrator
>> OmniTI Computer Consulting, Inc.
>> http://www.keithf4.com
>>
>>
> So I've made some progress and thought I'd found a way around requiring
> the use of shared memory (for now).
>
> https://gist.github.com/keithf4/3dbda40f7f941ada7ffe
>
> The idea is there will be a main BGW that is always running in a loop. At
> the interval time period, the main loop will parse a GUC variable to get a
> list of databases and launch a dynamic BGW for each one to run the
> partition maintenance if pg_partman is installed.
>
> I looked up the method that the shared_preload_libraries GUC uses to parse
> a csv list of values starting on line 238 with making a copy of the GUC
> variable to parse out
>
> https://gist.github.com/keithf4/3dbda40f7f941ada7ffe#file-pg_partman_bgw-c-L238
>
> Everything within the pg_partman_bgw_main() function can read the value
> fine, all the way to the point where it assigns the dbname variable here
>
> https://gist.github.com/keithf4/3dbda40f7f941ada7ffe#file-pg_partman_bgw-c-L251
>
> However, once I try and pass this variable to the dynamic BGW function, it
> seems to lose access to it. An example from my postgres logs:
>
> 2015-04-21 17:18:57 EDT [] [6866]: [6-1] user=,db=,e=00000 LOG: starting
> background worker process "pg_partman master background worker"
> 2015-04-21 17:18:57 EDT [] [6866]: [7-1] user=,db=,e=00000 LOG: database
> system is ready to accept connections
> 2015-04-21 17:18:57 EDT [] [6875]: [1-1] user=,db=,e=00000 LOG:
> pg_partman master background worker master process initialized with role
> keith on database template1
> 2015-04-21 17:18:57 EDT [] [6872]: [1-1] user=,db=,e=00000 LOG:
> autovacuum launcher started
> 2015-04-21 17:19:17 EDT [] [6875]: [2-1] user=,db=,e=00000 DEBUG: Dynamic
> bgw launch begun for keith
> 2015-04-21 17:19:17 EDT [] [6866]: [8-1] user=,db=,e=00000 LOG:
> registering background worker "pg_partman dynamic background worker
> (dbname=keith)"
> 2015-04-21 17:19:17 EDT [] [6866]: [9-1] user=,db=,e=00000 LOG: starting
> background worker process "pg_partman dynamic background worker
> (dbname=keith)"
> 2015-04-21 17:19:17 EDT [] [6877]: [1-1] user=,db=,e=00000 DEBUG: Before
> run_maint initialize connection for db @òö
> 2015-04-21 17:19:17 EDT [] [6877]: [2-1] user=,db=,e=3D000 FATAL:
> database " @òö " does not exist
> 2015-04-21 17:19:17 EDT [] [6866]: [10-1] user=,db=,e=00000 LOG: worker
> process: pg_partman dynamic background worker (dbname=keith) (PID 6877)
> exited with exit code 1
> 2015-04-21 17:19:17 EDT [] [6866]: [11-1] user=,db=,e=00000 LOG:
> unregistering background worker "pg_partman dynamic background worker
> (dbname=keith)"
>
> You can see the first line of the pg_partman_bgw_run_maint() function just
> sees the variable as junk
>
> https://gist.github.com/keithf4/3dbda40f7f941ada7ffe#file-pg_partman_bgw-c-L312
>
> Doing some debugging, I've narrowed down the issue back to where the
> pstrdup() makes a copy of the GUC variable. If I just have a single, valid
> database set for pg_partman_bgw.dbname like this in postgresql.conf
>
> pg_partman_bgw.dbname = 'keith'
>
> And I then replace this line
>
> https://gist.github.com/keithf4/3dbda40f7f941ada7ffe#file-pg_partman_bgw-c-L262
> with
> worker.bgw_main_arg = CStringGetDatum(pg_partman_bgw_dbname);
> everything works fine
>
> 2015-04-21 17:40:18 EDT [] [7113]: [8-1] user=,db=,e=00000 LOG:
> registering background worker "pg_partman dynamic background worker
> (dbname=keith)"
> 2015-04-21 17:40:18 EDT [] [7113]: [9-1] user=,db=,e=00000 LOG: starting
> background worker process "pg_partman dynamic background worker
> (dbname=keith)"
> 2015-04-21 17:40:18 EDT [] [7127]: [1-1] user=,db=,e=00000 DEBUG: Before
> run_maint initialize connection for db keith
> 2015-04-21 17:40:18 EDT [] [7127]: [2-1] user=,db=,e=00000 DEBUG: After
> run_maint initialize connection for db keith
> 2015-04-21 17:40:18 EDT [] [7127]: [3-1] user=,db=,e=00000 DEBUG:
> Checking if pg_partman extension is installed in database: keith
> 2015-04-21 17:40:18 EDT [] [7127]: [4-1] user=,db=,e=00000 LOG:
> pg_partman dynamic background worker (dbname=keith) dynamic background
> worker initialized with role keith on database keith
> 2015-04-21 17:40:18 EDT [] [7127]: [5-1] user=,db=,e=00000 LOG:
> pg_partman dynamic background worker (dbname=keith): SELECT
> partman.run_maintenance(p_analyze := true, p_jobmon := false) called by
> role keith on database keith
> 2015-04-21 17:40:18 EDT [] [7113]: [10-1] user=,db=,e=00000 LOG: worker
> process: pg_partman dynamic background worker (dbname=keith) (PID 7127)
> exited with exit code 0
> 2015-04-21 17:40:18 EDT [] [7113]: [11-1] user=,db=,e=00000 LOG:
> unregistering background worker "pg_partman dynamic background worker
> (dbname=keith)"
>
> Obviously I want to be able to support more than a single database so I
> need a method of splitting the GUC variable. I even tried using a simpler
> method a coworker showed me using strtok_r() instead of this List method,
> but had the same result. All the examples in other tools I've seen for the
> bgw_main_arg value have been Int32 values (including your BDR tool). Is
> there any way to pass a string as the argument bgw_main_arg?
>
> --
> Keith Fiske
> Database Administrator
> OmniTI Computer Consulting, Inc.
> http://www.keithf4.com
>

Actually, to clarify that last question, is there any way to pass a
non-global string variable, or one defined in the main BGW function to
another BGW? I tested passing a string I defined in pg_partman_bgw_main()
to the bgw_main_arg and the same junk resulted (and yes, I'm sure I
passed-by-reference). I can only guess that the reason the
pg_partman_bgw_dbname GUC works is because of the context that GUC's are
defined in?

Keith

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Marc-André Goderre 2015-04-22 13:14:28 function returning a merge of the same query executed X time
Previous Message Keith Fiske 2015-04-21 21:47:10 Re: Background worker assistance & review