Re: Background worker assistance & review

From: Keith Fiske <keith(at)omniti(dot)com>
To: Craig Ringer <craig(at)2ndquadrant(dot)com>
Cc: PGSQL Mailing List <pgsql-general(at)postgresql(dot)org>
Subject: Re: Background worker assistance & review
Date: 2015-04-21 21:47:10
Message-ID: CAG1_KcCzcWqxERS5p3UA=REW97zqfJu=yCCFOoiWat1ZFkJtLg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

<http://www.keithf4.com>

On Fri, Apr 10, 2015 at 1:00 PM, Keith Fiske <keith(at)omniti(dot)com> wrote:

>
> On Thu, Apr 9, 2015 at 11:56 PM, Craig Ringer <craig(at)2ndquadrant(dot)com>
> wrote:
>
>>
>>
>> On 9 April 2015 at 05:35, Keith Fiske <keith(at)omniti(dot)com> wrote:
>>
>>> I'm working on a background worker (BGW) for my pg_partman extension.
>>> I've gotten the basics of it working for my first round, but there's two
>>> features I'm missing that I'd like to add before release:
>>>
>>> 1) Only allow one instance of this BGW to run
>>>
>>
>> Load your extension in shared_preload_libraries, so that _PG_init runs in
>> the postmaster. Register a static background worker then.
>>
>> If you need one worker per database (because it needs to access the DB)
>> this won't work for you, though. What we do in BDR is have a single static
>> background worker that's launched by the postmaster, which then launches
>> and terminates per-database workers that do the "real work".
>>
>> Because of a limitation in the bgworker API in releases 9.4 and older,
>> the static worker has to connect to a database if it wants to access shared
>> catalogs like pg_database. This limitation has been lifted in 9.5 though,
>> along with the need to use the database name instead of its oid to connect
>> (which left bgworkers unable to handle RENAME DATABASE).
>>
>> (We still really need a hook on CREATE DATABASE too)
>>
>> 2) Create a bgw_terminate_partman() function to stop it more intuitively
>>> than doing a pg_cancel_backend() on the PID
>>>
>>
>> If you want it to be able to be started/stopped dynamically, you should
>> probably use RequestAddinShmemSpace to allocate a small shared memory
>> block. Use that to register the PGPROC for the current worker when the
>> worker starts, and add a boolean field you can use to ask it to terminate
>> its self. You'll also need a LWLock to protect access to the segment, so
>> you don't have races between a worker starting and the user asking to
>> cancel it, etc.
>>
>> Unfortunately the BackgroundWorkerHandle struct is opaque, so you cannot
>> store it in shared memory when it's returned by
>> RegisterDynamicBackgroundWorker() and use it to later check the worker's
>> status or ask it to exit. You have to use regular backend manipulation
>> functions and PGPROC instead.
>>
>> Personally, I suggest that you leave the worker as a static worker, and
>> leave it always running when the extension is active. If it isn't doing
>> anything, have it sleep on its latch, then set its latch from other
>> processes when something interesting happens. (You can put the process
>> latch from PGPROC into your shmem segment so you can set it from elsewhere,
>> or allocate a new latch).
>>
>> This is my first venture into writing C code for postgres, so I'm not
>>> familiar with a lot of the internals yet. I read
>>> http://www.postgresql.org/docs/9.4/static/bgworker.html and I see it
>>> mentioning how you can check the status of a BGW launched dynamically and
>>> the function to terminate one, but I'm not clear how how you can get the
>>> information on a currently running BGW to do these things.
>>>
>>
>> You can't. It's a pretty significant limitation in the current API.
>> There's no way to enumerate bgworkers via the bgworker API, only via PGPROC.
>>
>>
>>> I used the worker_spi example for a lot of this, so if there's any
>>> additional guidance for a better way to do what I've done, I'd appreciate
>>> it. All I really have it doing now is calling the run_maintenance()
>>> function at a defined interval and don't need it doing more than that yet.
>>> <http://www.keithf4.com>
>>>
>>
>> The BDR project has an extension with much more in-depth use of
>> background workers, but it's probably *too* complicated. We have a static
>> bgworker that launches and terminates dynamic bgworkers (per-database) that
>> in turn launch and terminate more dynamic background workers
>> (per-connection to peer databases).
>>
>> If you're interested, all the code is mirrored on github:
>>
>> https://github.com/2ndquadrant/bdr/tree/bdr-plugin/next
>>
>> and the relevant parts are:
>>
>> https://github.com/2ndQuadrant/bdr/blob/bdr-plugin/next/bdr.c#L640
>> https://github.com/2ndQuadrant/bdr/blob/bdr-plugin/next/bdr_perdb.c
>> https://github.com/2ndQuadrant/bdr/blob/bdr-plugin/next/bdr_supervisor.c
>> https://github.com/2ndQuadrant/bdr/blob/bdr-plugin/next/bdr_shmem.c
>> https://github.com/2ndQuadrant/bdr/blob/bdr-plugin/next/bdr_apply.c#L2401
>> https://github.com/2ndQuadrant/bdr/blob/bdr-plugin/next/bdr.h
>>
>> ... but there's a *lot* of code there.
>>
>> --
>> Craig Ringer http://www.2ndQuadrant.com/
>> PostgreSQL Development, 24x7 Support, Training & Services
>>
>
>
> Craig,
>
> Thanks for the response! Definitely cleared up a lot of questions I had
> regarding how to interact with currently running BGWs. Glad to know I can
> at least stop banging my head against the desk about that. I've still got a
> lot to learn as far as how to interact with shared memory, but now that I
> know that's the path I have to go down, I'm fine with that.
>
> My current plan now after your response this this:
>
> - Statically launch master BGW with shared_preload_libraries
> - Use dynamically launched BGW for each database that pg_partman will run
> on in the cluster. My previous idea of restricting it to one BGW would
> likely have stopped it from ever working on more than one database in a
> cluster.
> - Will see if I can create a function that polls the cluster for currently
> existing databases that actually have pg_partman installed. This should
> eliminate the need for a GUC naming the databases to run for. Should allow
> handling if a database is renamed as well. This way, as soon as the
> extension is created on a database, it should hopefully "just work" and
> start managing it.
>
> 9.4 is my targeted release to support for a while, so I'll just have to
> deal with the shortcomings you mentioned. Does the above sound like it
> could work then?
>
> --
> Keith Fiske
> Database Administrator
> OmniTI Computer Consulting, Inc.
> http://www.keithf4.com
>
>
So I've made some progress and thought I'd found a way around requiring the
use of shared memory (for now).

https://gist.github.com/keithf4/3dbda40f7f941ada7ffe

The idea is there will be a main BGW that is always running in a loop. At
the interval time period, the main loop will parse a GUC variable to get a
list of databases and launch a dynamic BGW for each one to run the
partition maintenance if pg_partman is installed.

I looked up the method that the shared_preload_libraries GUC uses to parse
a csv list of values starting on line 238 with making a copy of the GUC
variable to parse out
https://gist.github.com/keithf4/3dbda40f7f941ada7ffe#file-pg_partman_bgw-c-L238

Everything within the pg_partman_bgw_main() function can read the value
fine, all the way to the point where it assigns the dbname variable here
https://gist.github.com/keithf4/3dbda40f7f941ada7ffe#file-pg_partman_bgw-c-L251

However, once I try and pass this variable to the dynamic BGW function, it
seems to lose access to it. An example from my postgres logs:

2015-04-21 17:18:57 EDT [] [6866]: [6-1] user=,db=,e=00000 LOG: starting
background worker process "pg_partman master background worker"
2015-04-21 17:18:57 EDT [] [6866]: [7-1] user=,db=,e=00000 LOG: database
system is ready to accept connections
2015-04-21 17:18:57 EDT [] [6875]: [1-1] user=,db=,e=00000 LOG: pg_partman
master background worker master process initialized with role keith on
database template1
2015-04-21 17:18:57 EDT [] [6872]: [1-1] user=,db=,e=00000 LOG: autovacuum
launcher started
2015-04-21 17:19:17 EDT [] [6875]: [2-1] user=,db=,e=00000 DEBUG: Dynamic
bgw launch begun for keith
2015-04-21 17:19:17 EDT [] [6866]: [8-1] user=,db=,e=00000 LOG:
registering background worker "pg_partman dynamic background worker
(dbname=keith)"
2015-04-21 17:19:17 EDT [] [6866]: [9-1] user=,db=,e=00000 LOG: starting
background worker process "pg_partman dynamic background worker
(dbname=keith)"
2015-04-21 17:19:17 EDT [] [6877]: [1-1] user=,db=,e=00000 DEBUG: Before
run_maint initialize connection for db @òö
2015-04-21 17:19:17 EDT [] [6877]: [2-1] user=,db=,e=3D000 FATAL: database
" @òö " does not exist
2015-04-21 17:19:17 EDT [] [6866]: [10-1] user=,db=,e=00000 LOG: worker
process: pg_partman dynamic background worker (dbname=keith) (PID 6877)
exited with exit code 1
2015-04-21 17:19:17 EDT [] [6866]: [11-1] user=,db=,e=00000 LOG:
unregistering background worker "pg_partman dynamic background worker
(dbname=keith)"

You can see the first line of the pg_partman_bgw_run_maint() function just
sees the variable as junk
https://gist.github.com/keithf4/3dbda40f7f941ada7ffe#file-pg_partman_bgw-c-L312

Doing some debugging, I've narrowed down the issue back to where the
pstrdup() makes a copy of the GUC variable. If I just have a single, valid
database set for pg_partman_bgw.dbname like this in postgresql.conf

pg_partman_bgw.dbname = 'keith'

And I then replace this line
https://gist.github.com/keithf4/3dbda40f7f941ada7ffe#file-pg_partman_bgw-c-L262
with
worker.bgw_main_arg = CStringGetDatum(pg_partman_bgw_dbname);
everything works fine

2015-04-21 17:40:18 EDT [] [7113]: [8-1] user=,db=,e=00000 LOG:
registering background worker "pg_partman dynamic background worker
(dbname=keith)"
2015-04-21 17:40:18 EDT [] [7113]: [9-1] user=,db=,e=00000 LOG: starting
background worker process "pg_partman dynamic background worker
(dbname=keith)"
2015-04-21 17:40:18 EDT [] [7127]: [1-1] user=,db=,e=00000 DEBUG: Before
run_maint initialize connection for db keith
2015-04-21 17:40:18 EDT [] [7127]: [2-1] user=,db=,e=00000 DEBUG: After
run_maint initialize connection for db keith
2015-04-21 17:40:18 EDT [] [7127]: [3-1] user=,db=,e=00000 DEBUG: Checking
if pg_partman extension is installed in database: keith
2015-04-21 17:40:18 EDT [] [7127]: [4-1] user=,db=,e=00000 LOG: pg_partman
dynamic background worker (dbname=keith) dynamic background worker
initialized with role keith on database keith
2015-04-21 17:40:18 EDT [] [7127]: [5-1] user=,db=,e=00000 LOG: pg_partman
dynamic background worker (dbname=keith): SELECT
partman.run_maintenance(p_analyze := true, p_jobmon := false) called by
role keith on database keith
2015-04-21 17:40:18 EDT [] [7113]: [10-1] user=,db=,e=00000 LOG: worker
process: pg_partman dynamic background worker (dbname=keith) (PID 7127)
exited with exit code 0
2015-04-21 17:40:18 EDT [] [7113]: [11-1] user=,db=,e=00000 LOG:
unregistering background worker "pg_partman dynamic background worker
(dbname=keith)"

Obviously I want to be able to support more than a single database so I
need a method of splitting the GUC variable. I even tried using a simpler
method a coworker showed me using strtok_r() instead of this List method,
but had the same result. All the examples in other tools I've seen for the
bgw_main_arg value have been Int32 values (including your BDR tool). Is
there any way to pass a string as the argument bgw_main_arg?

--
Keith Fiske
Database Administrator
OmniTI Computer Consulting, Inc.
http://www.keithf4.com

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Keith Fiske 2015-04-21 23:12:15 Re: Background worker assistance & review
Previous Message Jacek Wielemborek 2015-04-21 21:05:53 Performance tuning assisted by a GUI application